0% found this document useful (0 votes)

71 views27 pages

Traffic Analysis Using Streaming Queries: Mike Fisk Los Alamos National Laboratory

This document discusses optimizing the evaluation of multiple queries over streaming data. It presents two strategies: counting unique subexpressions or reducing redundant tests via dataflow analysis. Performance comparisons show dataflow outperforms counting when tests can be short-circuited, as vector functions allow in intrusion detection systems. Continuous query systems provide an optimization framework for traffic analysis, treating streaming data like databases treat stored data.

Uploaded by

Ramesh Paramasivam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views27 pages

Traffic Analysis Using Streaming Queries: Mike Fisk Los Alamos National Laboratory

Uploaded by

Ramesh Paramasivam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Traffic Analysis Using Streaming Queries

Mike Fisk Los Alamos National Laboratory [email protected]

Outline
Intro to Continuous Query Systems
a.k.a Streaming Databases Relevance to data networks

Optimizing the evaluation of multiple Boolean queries

Counting Algorithm Snort Static Dataflow Optimization
Common Subexpression Vector Algorithms

Performance Comparisons

Observations
Traffic analysis tools are data-type-specific
Flowtools netflow Snort pcap Psad iptables logs

Most analysis systems lack a framework for optimizing

rules/queries
Reordering boolean expressions Grouping (common sub-expressions) Vector/set operations

Continuous Query Systems

Continuous Query systems are to streaming data what Relational
Database systems are to stored data
Filtering, summarization, aggregation

Example datasets:
Sensor data (temperature, traffic, etc) Stock exchange transactions Packets, flows, logs

Inefficient and high latency to load data into a traditional database

and query periodically.
How often could you afford to re-execute the query?

Example systems:
NiagraCQ (Wisc), Telegraph (Berkeley), SMACQ, etc. Commerical: StreamBase, etc.

Example systems in disguise:

Snort, router ACLs, firewall filters, packet classification, egrep
4

System for Modular Analysis & Continuous Queries

Queries Optimized Data-Flow Graphs Scheduler Processing Modules Type Run-Time Type Modules Dynamically Loaded Internals Specied at run-time

Type Model
Stream of dynamically & heterogeneously typed objects
Each object can have different type Types need not be statically defined in advance

Objects refer to storage locations

Internal to the object, or references into other objects or external memory

Objects have fields

Fields are (indifferently) struct elements, enums, unions, casts, string conversions, etc. Fields are first-class objects Fields can be dynamically attached to objects

Objects are immutable

Enables parallelism without locking

Type Module Definition

There are no fundamental types Pcap packet example
struct dts_field_spec dts_type_packet_fields[] = { //Type Name Access Function if not fixed { "timeval", "ts", NULL }, // Fixed-length, fixed-location { "uint32", "caplen", NULL }, { "uint32, "len", NULL }, { "ipproto, "ipprotocol", dts_pkthdr_get_protocol }, // Function-pointer { "string, "packet", dts_pkthdr_get_packet }, { "macaddr, "dstmac", dts_pkthdr_get_dstmac }, { "nuint16", "ethertype", dts_pkthdr_get_ethertype }, { "ip", "srcip", dts_pkthdr_get_srcip },

SMACQ Processing Modules

Modules are the atoms of query optimization Written in C++ or Python Take arbitrary flags and arguments
Unix command-line style

Introspection: Can ask runtime to identify downstream invariants

When module can do eager pre-filtering (e.g. hardware prefilter on NIC, database query, etc.)

Event-driven (produce/consume) API

Can use threaded wrapper if lazy (really co-routines)

Can embed other query instantiations

Can instantiate new scheduler, or share primary (preferred)

Example Processing Module (Python)

Class Dumper: Print a few elements of each datum and pass every 5th def __init__(self, smacq, *args): print ('init', args) self.smacq = smacq #Save reference to runtime self.buf = [] #List of objects received def consume(self, datum): for i in 'srcip', 'dstip', 'ipprotocol', 'len': v = datum[i].value print (i, datum[i].type, type(v), v) self.buf.append(datum) if len(self.buf) == 5: self.smacq.enqueue(datum) # Output object downstream self.buf = []
9

Query Model: Dataflow Graphs

Queries are dataflow graphs
AND

pcaplive
Input

==
Stateless filtering

uniq
Stateful filtering

print
Output

Modules declare algebraic properties:

stateless (map), annotation, vector, demux, (associative) Enables optimization, rewriting, parallelization, map/reduce

Static optimizer applies all data-flow optimizations

permitted by algebraic properties of the involved modules
10

Optimizing Continuous Queries

Traditional database query optimization:
Uses data indexes Minimizes individual query times

Continuous-query optimization:
Executing many queries simultaneously Minimize resource consumption per unit of data input
Maximize data throughput

Why is multiple query processing important? Approximately 8 new rules each week

Optimization of 150 Snort Rules

Example Queries 6 Tests in 3 Rules

sport=80?

ip=x?

Packet Capture

sport=80? sport=80? ip=y?

contains FOO?

Reporter

Snort Approach
[Roesh, LISA 99]

Example: 6-7 Tests

Per-Tuple Tests

Unique 5-Tuples
srcip=x? sport=80?

Packet Capture

contains BOO?

Reporter

srcip=y? sport=80?

srcip=*? sport=80?

Counting Approach
[Carzaniga & Wolf, SIGCOMM 03]

Example: 7 Tests
Rules/Queries
(x, 80) total=2? sport=80 total=1?

Unique Sub-expressions
ip=x?

Packet Capture

sport=80? ip=y?
contains BOO?

Reporter

(y, 80, BOO) total=3?

Data-Flow Approach Example: 1-4 Tests

ip=x?

Packet Capture

sport=80? ip=y?
contains BOO?

Reporter

1. Common roots 2. Common leaves 3. Common upstream graphs 4. Common downstream graphs

Performance Comparison

Total Constraints

Vector Functions
Most optimizations in stream analysis have employed a class of
algorithms that can be characterized as vector functions:
f(x, v ) = f(x, v1), f(x, v2), . Vector version is typically O(1) or O(log n) instead of O(n)

Examples
Set of equality tests becomes a single lookup in a hash-table Set of string matches becomes a single DFA to traverse
Lookup dstport 80 25

dstport==80 dstport==25

X Y

Performance Comparison with Vector Functions

> 80% of tests short-circuited

Analysis: Why was Counting better only without vectors?

Assume that each test results in p more tests
p = fanout short-circuiting p fanout 0 short-circuiting 1

Assume data-flow of tests is a balanced tree of depth d

d is an integer 1

Expected number of evaluations:

1 + p + p2 + p3 + + pd-1 = (1 - pd) / (1 - p)

Let u = number of unique tests = Countings performance

s(1 - pd) / (1 - p) < u if (d > 1, p < 1)

For IDS test: d = 6

With Vectors (u=39): p < 1.7 is desired. Actual p = 1 Without Vectors (u=1782): p < 4.2 is desired. Actual p = 5.8

Supported Query Languages

SQL style:

print srcip, dstip from (cflow where dstport==80 and uniq(srcip, dstip))

Misplaced belief that since SQL is well defined, people can just use it Deeply nested queries make you wish you were merely nested in s-expressions

Unix pipe style:

cflow | where dstport==80 | uniq srcip dstip | print srcip, dstip AND

pcaplive
Input

==
Stateless filtering

uniq
Stateful filtering

print
Output
22

Supported Query Languages

Clean, allows named subexpressions

Join Models
DFA module
Define a state machine where transitions specified as Booleans on new inputs

SQL style
Example: print running cross-product
print a.ipid b.ipid from pcapfile([email protected]) a, b where a.ipid != b.ipid

New keyword UNTIL defines when state can be removed

NEW refers to newly input data for comparison

Example: print retransmissions within the same second

print expr(b.ts - a.ts) from pcaplive() a until(new.a.ts.sec > a.ts.sec), b until(new) where b.ts > a.ts and a.srcip == b.srcip and a.srcport == b.srcport and a.seq == b.seq and a.payload != and b.payload !=

Usage Experience
Online detection & automated response systems Ad-hoc queries for forensic analysis and data exploration Feature extraction for other software

Conclusions
Continuous Queries provide a common query syntax,
software infrastructure, and optimization framework for traffic analysis

CQ necessary for streaming applications, sufficient for

ad-hoc forensic analysis

Open source at smacq.sf.net!

Conclusions

Open source at smacq.sf.net!

Continuous Queries provide a common query syntax, software

infrastructure, and optimization framework for traffic analysis

Two identified strategies for static optimization of multiple queries

Remove (Counting) or Reduce (Data-flow) redundant tests Boolean (Data-flow) short-circuiting removes need for some subsequent tests

Performance Analysis:
Counting is preferable when short-circuiting is rare Data-flow out-performs counting when short-circuiting is significant
When breadth of graph is reduced with vector functions, actual IDS workload benefits significantly from short-circuiting

Data-flow approach can also benefit from additional, dynamic

reordering of tests to maximize early short-circuiting

Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
Full Text 01
No ratings yet
Full Text 01
94 pages
Designing A Network Search System
No ratings yet
Designing A Network Search System
5 pages
Comprehensive Oracle SQL Guide
No ratings yet
Comprehensive Oracle SQL Guide
9 pages
Lec4 11 11 16
No ratings yet
Lec4 11 11 16
27 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
05 Surveys Hirzel
No ratings yet
05 Surveys Hirzel
12 pages
Distributed Query Processing Guide
No ratings yet
Distributed Query Processing Guide
24 pages
Data Stream Management
No ratings yet
Data Stream Management
46 pages
SF8 - Unit 2 DDB
No ratings yet
SF8 - Unit 2 DDB
97 pages
LINE Python
No ratings yet
LINE Python
96 pages
Query Processing
No ratings yet
Query Processing
28 pages
Ia-3 S&S
No ratings yet
Ia-3 S&S
10 pages
Program Analysis
No ratings yet
Program Analysis
73 pages
Qos Management of Real-Time Data Stream Queries in Distributed Environments
No ratings yet
Qos Management of Real-Time Data Stream Queries in Distributed Environments
8 pages
Comet
No ratings yet
Comet
581 pages
Qpython Documentation
No ratings yet
Qpython Documentation
60 pages
Mpi2 Report
No ratings yet
Mpi2 Report
370 pages
CSDS 5th Sem Midsem 2023
No ratings yet
CSDS 5th Sem Midsem 2023
4 pages
CSE 6th Semester
No ratings yet
CSE 6th Semester
6 pages
BCA-3rd Sem Syllabus
No ratings yet
BCA-3rd Sem Syllabus
14 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
7 pages
DD Mani
No ratings yet
DD Mani
10 pages
4 - Spark SQL
No ratings yet
4 - Spark SQL
58 pages
Signals and Systems Internal Assessment
No ratings yet
Signals and Systems Internal Assessment
11 pages
Page Replacement in Operating System Memory Management: Heikki Paajanen
No ratings yet
Page Replacement in Operating System Memory Management: Heikki Paajanen
109 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
17 pages
Unit 5-1
No ratings yet
Unit 5-1
8 pages
13 Modelling Programs 25-01-2025
No ratings yet
13 Modelling Programs 25-01-2025
43 pages
Pro Top
No ratings yet
Pro Top
53 pages
Mpi Book
No ratings yet
Mpi Book
673 pages
Eij KH Out Parallel Programming
No ratings yet
Eij KH Out Parallel Programming
838 pages
HPC 2025
No ratings yet
HPC 2025
16 pages
Relational Stream Processing Overview
No ratings yet
Relational Stream Processing Overview
40 pages
Eij KH Out Parallel Programming
No ratings yet
Eij KH Out Parallel Programming
679 pages
2025 Quiz 375
No ratings yet
2025 Quiz 375
5 pages
ScalaFlow: Continuation-Based Data Flow in Scala
No ratings yet
ScalaFlow: Continuation-Based Data Flow in Scala
109 pages
Unit 4 Streaming Data
No ratings yet
Unit 4 Streaming Data
4 pages
C++ Signal & Image Processing Guide
No ratings yet
C++ Signal & Image Processing Guide
213 pages
2010 KamilAnikijej
No ratings yet
2010 KamilAnikijej
72 pages
Structure and Interpretation of Signals and Systems Edward A. Lee Instant Download
No ratings yet
Structure and Interpretation of Signals and Systems Edward A. Lee Instant Download
52 pages
Scheme and Syllabus of BCA-1
No ratings yet
Scheme and Syllabus of BCA-1
16 pages
Hsslive Xii Model Exam 2021 Question Paper Comp Science
No ratings yet
Hsslive Xii Model Exam 2021 Question Paper Comp Science
11 pages
The Parallel Book
No ratings yet
The Parallel Book
646 pages
Grand Viva Question Answer Collection
No ratings yet
Grand Viva Question Answer Collection
37 pages
MIT - Applied Parallel Computing - Alan Edelman
No ratings yet
MIT - Applied Parallel Computing - Alan Edelman
187 pages
CS P3 MS Notes
No ratings yet
CS P3 MS Notes
14 pages
7 - Streaming 2 - Calcite
No ratings yet
7 - Streaming 2 - Calcite
45 pages
B.Tech Computer Science Semester VII Curriculum
No ratings yet
B.Tech Computer Science Semester VII Curriculum
41 pages
Packet Classification and Filteringgg
No ratings yet
Packet Classification and Filteringgg
51 pages
Untitled
No ratings yet
Untitled
18 pages
Ebook Fast Data Architectures For Streaming Applications 2
No ratings yet
Ebook Fast Data Architectures For Streaming Applications 2
58 pages
Lec 12
No ratings yet
Lec 12
18 pages
1 s2.0 S2352220824000518 Main
No ratings yet
1 s2.0 S2352220824000518 Main
23 pages
Unit 3
No ratings yet
Unit 3
25 pages
DDB Lec 4 PDF
No ratings yet
DDB Lec 4 PDF
69 pages
A Guide To ACH Payments Federal Government: On-Line
100% (4)
A Guide To ACH Payments Federal Government: On-Line
109 pages
OpenText Documentum Connector For Microsoft SharePoint 16.7 - Installation Guide English (EDCCLCOSP160700-IGD-En-01)
No ratings yet
OpenText Documentum Connector For Microsoft SharePoint 16.7 - Installation Guide English (EDCCLCOSP160700-IGD-En-01)
34 pages
"Counting My Blessings" Lord Lo
No ratings yet
"Counting My Blessings" Lord Lo
3 pages
Instrucciones para Instalar Cilindros de Dirección Con Sensores de Posición en Determinadas Motoniveladoras
No ratings yet
Instrucciones para Instalar Cilindros de Dirección Con Sensores de Posición en Determinadas Motoniveladoras
57 pages
Quant Interview and Exam Prep
100% (1)
Quant Interview and Exam Prep
21 pages
Key Characteristics of Digital Marketing
No ratings yet
Key Characteristics of Digital Marketing
22 pages
BSI ISO 14001 ISO 9001 Case Study APS UK EN PDF
No ratings yet
BSI ISO 14001 ISO 9001 Case Study APS UK EN PDF
2 pages
PayPal Dispute Mastery Guide
100% (2)
PayPal Dispute Mastery Guide
7 pages
Fortich ST., Jose UN Bldg. Malaybalay City Tel # (088) 813-3925/cell No.09177206334
No ratings yet
Fortich ST., Jose UN Bldg. Malaybalay City Tel # (088) 813-3925/cell No.09177206334
5 pages
Wimax
No ratings yet
Wimax
6 pages
Mukesh 1
No ratings yet
Mukesh 1
2 pages
Xerox C118 Status Codes Explained
No ratings yet
Xerox C118 Status Codes Explained
10 pages
Unidrive SP To Unidrive M Retrofit
No ratings yet
Unidrive SP To Unidrive M Retrofit
56 pages
Pin Out intelliSCAN
No ratings yet
Pin Out intelliSCAN
1 page
SRAM Memory Design Guide
No ratings yet
SRAM Memory Design Guide
74 pages
COE480 - Lecture1 Exercises MIPS
No ratings yet
COE480 - Lecture1 Exercises MIPS
21 pages
SP1 Smart Positioner
No ratings yet
SP1 Smart Positioner
4 pages
Report OE6 FA2 New Editt
No ratings yet
Report OE6 FA2 New Editt
13 pages
GenAI Project Report
No ratings yet
GenAI Project Report
15 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
Hochiki ACD-V Multi-Criteria Sensor
No ratings yet
Hochiki ACD-V Multi-Criteria Sensor
2 pages
Strength and Behavior of Polypropylene Fiber Reinforced Concrete Double Tee Beams
No ratings yet
Strength and Behavior of Polypropylene Fiber Reinforced Concrete Double Tee Beams
8 pages
Tpa 3138 D 2
No ratings yet
Tpa 3138 D 2
38 pages
Silicon PNP Power Transistors: TIP42/42A/42B/42C
No ratings yet
Silicon PNP Power Transistors: TIP42/42A/42B/42C
4 pages
Electronics & Circuit Studies
No ratings yet
Electronics & Circuit Studies
65 pages
Analysis and Simulation of Plant Disease Progress
No ratings yet
Analysis and Simulation of Plant Disease Progress
13 pages
(PDF) Evaluation of Digital Photography From Model Aircraft For Remote Sensing of Crop Biomass and Nitrogen Status
No ratings yet
(PDF) Evaluation of Digital Photography From Model Aircraft For Remote Sensing of Crop Biomass and Nitrogen Status
21 pages
Configuring GlobalProtect SSL VPN Using A User-Defined Port
No ratings yet
Configuring GlobalProtect SSL VPN Using A User-Defined Port
28 pages
Prince Input
No ratings yet
Prince Input
4 pages
Sample Partnership Letter (Modify Appropriately For Constitution Code) (On LETTER HEAD)
No ratings yet
Sample Partnership Letter (Modify Appropriately For Constitution Code) (On LETTER HEAD)
2 pages

Traffic Analysis Using Streaming Queries: Mike Fisk Los Alamos National Laboratory

Uploaded by

Traffic Analysis Using Streaming Queries: Mike Fisk Los Alamos National Laboratory

Uploaded by

Traffic Analysis Using Streaming Queries

Mike Fisk Los Alamos National Laboratory [email protected]

Optimizing the evaluation of multiple Boolean queries

Most analysis systems lack a framework for optimizing

Continuous Query Systems

Inefficient and high latency to load data into a traditional database

Example systems in disguise:

System for Modular Analysis & Continuous Queries

Objects refer to storage locations

Objects have fields

Objects are immutable

Type Module Definition

SMACQ Processing Modules

Introspection: Can ask runtime to identify downstream invariants

Event-driven (produce/consume) API

Can embed other query instantiations

Example Processing Module (Python)

Query Model: Dataflow Graphs

Modules declare algebraic properties:

Static optimizer applies all data-flow optimizations

Optimizing Continuous Queries

Optimization of 150 Snort Rules

Example Queries 6 Tests in 3 Rules

sport=80? sport=80? ip=y?

Example: 6-7 Tests

(y, 80, BOO) total=3?

Data-Flow Approach Example: 1-4 Tests

Performance Comparison with Vector Functions

> 80% of tests short-circuited

Analysis: Why was Counting better only without vectors?

Assume data-flow of tests is a balanced tree of depth d

Expected number of evaluations:

Let u = number of unique tests = Countings performance

For IDS test: d = 6

Supported Query Languages

Unix pipe style:

Supported Query Languages

Clean, allows named subexpressions

New keyword UNTIL defines when state can be removed

Example: print retransmissions within the same second

CQ necessary for streaming applications, sufficient for

Open source at smacq.sf.net!

Open source at smacq.sf.net!

Continuous Queries provide a common query syntax, software

Two identified strategies for static optimization of multiple queries

Data-flow approach can also benefit from additional, dynamic

You might also like