Quine’s Real-time Temporal Event Sequencing Produces New Insights
One of the fundamental advantages of Quine’s architecture compared to other complex event stream processing technologies, like Flink and ksqlDB, is that it is not constrained by time windows. We demonstrated the value of this capability in the “Are You Ready for Low and Slow Auth Attacks?” blog, where we demonstrated how you can use Quine to identify password spraying attacks that take place over extended periods, defeating legacy detection mechanisms constrained by time windowing.
But what about cases where the sequence of events is critical to detecting and investigating interesting incidents? For example, when performing root cause analysis (RCA) for performance issues in a NOC or security incidents in a SOC, the temporal ordering of events is often as important as the events themselves.
Event Sequencing in Real Time: A Streaming Graph Strength
Event sequencing can provide key information for accurate and timely detection and analysis, even in the most complex cases where causality and temporal ordering are difficult to ascertain. The key is architecting a graph structure that can most effectively answer your questions and produce insights.
In the case of a streaming graph solution like Quine, this means modeling the graph so queries can effectively traverse nodes and edges natively, which is always more efficient than path matching based on node properties like timestamps. This is because the relations between nodes (edges) are persisted in the nodes themselves.
We like using an event sequencing technique to explicitly identify order based on a pattern match detected by one of Quine’s most powerful features, the standing query. (Standing queries monitor streams for specified patterns, maintaining partial matches, and executing user-specified actions the instant a full match is made.)
We demonstrate this technique in the APT (Advanced Persistent Threat) Detection recipe (https://quine.io/recipes/apt-detection) to create sequence edges as Quine ingests EDR (Endpoint Detection and Response) and network traffic logs while monitoring for an Indicator of Behavior (IoB) that matches malicious data exfiltration patterns.
Our approach to this technique has four key components.
- Model a behavioral pattern as a subgraph
- Develop Cypher to match the subgraph in the event stream
- Encode the event sequence into the graph
- Emit an alert containing a
linkURL
to the subgraph inside the Quine Exploration UI
Our concern is not the timeframe of events (how quickly they happen). Rather, our focus is locating a specific sequence of events in order – WRITE->READ->SEND->DELETE
– regardless of the time interval across which the events occurred.
A subgraph like the one below can model the data exfiltration event from the APT Detection recipe.
Based on the model, we develop Cypher to match the subgraph in the pattern match section of a Quine standing query:
MATCH (e1)-[:EVENT]->(f)<-[:EVENT]-(e2),
(f)<-[:EVENT]-(e3)<-[:EVENT]-(p2)-[:EVENT]->(e4)
WHERE e1.type = "WRITE"
AND e2.type = "READ"
AND e3.type = "DELETE"
AND e4.type = "SEND"
RETURN DISTINCT id(f) as fileId
Next augment the subgraph in the standing query output to overlay sequencing with the CREATE clause, adding NEXT edges between the key nodes:
MATCH (p1)-[:EVENT]->(e1)-[:EVENT]->(f)<-[:EVENT]-(e2)<-[:EVENT]-(p2),
(f)<-[:EVENT]-(e3)<-[:EVENT]-(p2)-[:EVENT]->(e4)-[:EVENT]->(ip)
WHERE id(f) = $that.data.fileId
AND e1.type = "WRITE"
AND e2.type = "READ"
AND e3.type = "DELETE"
AND e4.type = "SEND"
AND e1.time < e2.time
AND e2.time < e3.time
AND e2.time < e4.time
CREATE (e1)-[:NEXT]->(e2)-[:NEXT]->(e4)-[:NEXT]->(e3)
The transformed subgraph in Quine becomes this.
There are three important things to note here:
- The synthetic
NEXT
edges only exist after the standing query match creates them - The
NEXT
edge labels enable us to efficiently traverse theWRITE->READ->SEND->DELETE
path with a simple Cypher query. - Temporal sequencing is even more difficult when dealing with multiple input sources. Imagine matching the
WRITE->READ->SEND->DELETE
pattern where the write, read and delete events come from one source and the send from another. Quine makes it easy to combine multiple event sources.
Once Quine identifies the event, we can explore the graph further with queries like the following:
MATCH (n)-[:NEXT*]->(m)
WHERE strId(n)="20b2059e-19c5-3ab6-b465-fe3593c45bc8"
RETURN DISTINCT collect(m),n
The final output — the WRITE --> READ --> SEND --> DELETE
subgraph.
As an alternative, use the /api/v1/query/cypher/nodes
API endpoint to build a dictionary of malicious file names.
[
{
"id": "f00ae947-3dd5-3c92-a84f-118b401c80f1",
"hostIndex": 0,
"label": "ID: f00ae947-3dd5-3c92-a84f-118b401c80f1",
"properties": {
"data": "/tmp/miscellaneous.data"
}
}
]
You can even use quick queries to follow the NEXT
edges in the exploration UI to find actions that occurred earlier in the event timeline.
Event Sequencing Benefits From Planning
Processing event streams using a streaming graph like Quine requires adjusting how you think about your data. For example, when the recipe we used in this post was first developed, it was focused on evaluating a single concern; find a specific subgraph. This required a simple plan for creating node IDs.
In the original use case, the choice was to create nodes using idFrom()
in its most basic form id(event) = idFrom($that)
, which was completely reasonable at the time. Now, asking a more complex question, “Show me any process that interacts with a file named /tmp/miscellaneous.data
” is more difficult because the node ID namespace plan did not include using individual node parameters. This is something to keep in mind when you plan your streaming event graph!
Temporal data doesn’t always need to be tied to timestamps. Instead, you can use temporal categories – morning/afternoon/night, before/after, etc. Many use cases, like our data exfiltration scenario, are built from understanding the sequence of events as a subgraph. What temporal use cases do you have that could benefit from detection using graph analysis, and how long does it take to detect those patterns today?
Try for Yourself
If you want to try Quine using your own data, here are some resources to help:
- Download Quine JAR| Docker Image | Github
- Start learning about Quine now by visiting the Quine open source project.
- Check out the Ingest Data into Quine blog series covering everything from ingest from Kafka to ingesting .CSV data
- APT Detection recipe – this recipe, referenced above, demonstrates the ability of streaming graphs to process event data without time windows.
Related posts
-
Stream Processing World Meets Streaming Graph at Current 2024
The thatDot team had a great time last week at Confluent’s big conference, Current 2024. We talked to a lot of folks about the power of Streaming Graph,…
-
Streaming Graph Get Started
It’s been said that graphs are everywhere. Graph-based data models provide a flexible and intuitive way to represent complex relationships and interconnectedness in data. They are particularly well-suited…
-
Streaming Graph for Real-Time Risk Analysis at Data Connect in Columbus 2024
After more than 25 years in the data management and analysis industry, I had a brand new experience. I attended a technical conference. No, that wasn’t the new…