Story Mode
Story Mode combines deterministic detectors with an LLM narrative generator to produce a rich, structured analysis of the network activity captured in a PCAP file.
Requirements
A configured LLM server is required for the narrative and Q&A features (see LLM Setup). The deterministic findings panels work without an LLM.
How Story Generation Works
Story generation is a two-phase LLM pipeline preceded by fully deterministic pre-computation. A new generation always replaces any previously stored story for the same file.
Phase 0 — Deterministic pre-computation (no LLM)
Before any LLM call is made, the backend runs two independent computations over the full conversation dataset:
Detector pipeline — eight detectors run in sequence and produce a typed, severity-sorted list of findings (see Deterministic Findings below).
Aggregates — statistics covering the full dataset are pre-computed: coverage counts, top external ASNs, protocol risk matrix, TLS anomaly summary, beacon candidates, and unknown-app percentage (see Aggregates Panel below).
Additionally, up to 50 timeline bins (time-bucketed packet and byte counts) are fetched from the timeline service to provide temporal context.
Phase 1 — Hypothesis and query generation
The LLM receives a structured prompt containing:
File and capture metadata (filename, size, packet count, bytes, duration, start/end times, total conversation count)
Protocol breakdown (per-protocol packet count, bytes, and percentage)
Traffic category breakdown (nDPI category distribution)
Deterministic findings (up to 20 by default, ordered by severity)
Full-dataset aggregates (unknown-app %, top external ASNs, protocol risk matrix, TLS anomaly summary, beacon candidates)
Traffic timeline (up to 50 time-window rows)
Optional analyst-supplied additional context
The LLM’s sole job in this phase is to produce up to 5 testable hypotheses
paired with structured database queries to investigate the most suspicious
activity. Each query specifies filters such as srcIp, dstIp,
dstPort, protocol, appName, category, hasRisks,
hasTlsAnomaly, riskType, minBytes, maxBytes, and minFlows.
Catch-all queries (no filters set) are automatically discarded. Note that
minBytes and maxBytes are per-conversation byte counts and are silently
ignored by the backend when srcIp or riskType is also present in the
same query.
Phase 2 — Narrative generation
Each query from Phase 1 is executed against the database. Up to 10 conversations (sorted by total bytes descending) are returned per query as evidence. The LLM then receives everything from Phase 1 plus the investigation results and writes the final narrative.
If Phase 1 fails for any reason (e.g. LLM error, context-length exceeded), the pipeline falls back to generating the narrative directly from the deterministic findings without investigation steps.
Context-length retry
If the LLM rejects the prompt due to context length, the UI presents the auto-built prompt for the analyst to trim before resubmitting. On retry, the edited prompt is sent directly to the LLM (Phase 1 is re-run to preserve investigation steps, but the narrative prompt itself is not rebuilt).
Known analysis limitations (embedded in every prompt)
The LLM is explicitly told the following constraints at generation time:
Packet payloads and HTTP bodies are not available.
DNS query names and TLS SNI are not captured.
Benign (non-risk) conversations are not individually listed.
These limitations also bound what the LLM can reliably state in its output.
What Story Mode Produces
Story Mode returns a response containing several components:
Deterministic Findings
Before the LLM is invoked, a pipeline of detector algorithms runs over
the conversation data and produces typed findings. Each finding has a
severity (CRITICAL, HIGH, MEDIUM, LOW), a title, a summary,
affected IPs, and numeric metrics. Detectors include:
Detector |
What it detects |
|---|---|
NdpiRisk |
Surfaces nDPI risk flags as findings, one finding per distinct risk type. Severity is determined by the risk type name:
|
Beacon |
Identifies periodic/beaconing traffic by computing the coefficient of
variation (CV) of inter-flow intervals. Flows with ≥3 repetitions,
mean interval ≥1 second, and CV <0.3 are flagged. CV <0.1 →
|
TlsAnomaly |
Detects self-signed certificates (issuer DN == subject DN), expired
certificates (not-after < now), and certificates from unknown/untrusted
CAs. Severity: |
Volume |
Two independent checks per source IP:
If both conditions fire for the same source IP, only the higher-severity finding is kept. |
FanOut |
Flags hosts that contacted many distinct destination IPs. The minimum
threshold to trigger a finding is >5 distinct destinations. >50 distinct
destinations → |
LongSession |
Flags individual conversations lasting longer than 15 minutes.
>1 hour → |
UnknownApp |
Flags captures where ≥5% of conversations could not be identified by
nDPI. >30% → |
PortProtocolMismatch |
Flags nDPI-identified applications running on non-standard ports.
Always
|
Findings are sorted by severity (CRITICAL first) then by detector type
for stable ordering. Up to 20 findings (by default) are included in the LLM
prompt; all findings are returned to the UI regardless.
LLM Narrative
The LLM writes a multi-section narrative from the pre-computed findings and investigation evidence. The output structure is enforced by the system prompt:
Narrative sections: a
summarysection first, thendetailsections per major finding cluster, ending with aconclusioncontaining recommendations.Highlights: every
CRITICALfinding must appear as ananomalyhighlight;HIGH→warning;MEDIUM/LOW→insight.Timeline events:
CRITICALfindings →criticalevent type;HIGH→suspicious;MEDIUM/LOW→normal.Suggested questions: exactly 3, specific to the actual findings.
Story Timeline
A visual timeline is derived from the narrative output, showing key events in chronological order.
Aggregates Panel
Pre-computed statistics over the full conversation dataset are displayed in the Traffic Intelligence panel. These are computed independently of the LLM and reflect all conversations in the PCAP, not just the evidence sample sent to the LLM.
The panel contains:
Coverage banner — total flows, total packets, percentage of flows flagged as at-risk (coloured green/yellow/red at 0%/10% thresholds), unknown-app percentage, and TLS anomaly count.
Top External Destinations — up to 7 external ASNs/orgs ranked by outbound bytes, with flow count, data volume, and percentage of total bytes.
Protocol Risk Overview — per-protocol breakdown of total conversation count vs. at-risk count. A conversation is at risk if its
flow_risksarray (populated by nDPI) is non-empty. The progress bar turns red when >30% of that protocol’s flows are at risk, yellow otherwise.TLS Certificate Health — aggregate counts of self-signed, expired, and unknown-CA TLS flows.
Beacon Candidates — up to 5 flows exhibiting periodic behaviour (CV <0.3, ≥3 flows, mean interval ≥1 s), sorted by CV ascending.
The ⓘ icon in the panel header reveals important caveats:
Beacon detection uses the coefficient of variation (CV) of inter-arrival times across flows to the same destination. A low CV (< 0.1) suggests highly regular, automated traffic. Short captures may produce false positives; legitimate software (NTP, telemetry agents) can appear beacon-like.
TLS health is based on certificate issuer metadata (issuer DN, subject DN, NotBefore, NotAfter) extracted at analysis time. Certificates are not re-validated at display time — an expired certificate shown here reflects the certificate’s own stated expiry date, not a live OCSP/CRL check.
ASN / geo data is enriched via ipinfo.io at analysis time and may not reflect recent IP address reassignments.
Interactive LLM Q&A Chat
After the narrative is generated, a chat panel allows follow-up questions about the PCAP. The LLM answers using the full story JSON as context and returns 3 suggested follow-up questions with each answer.
Investigation Panel
The Investigation panel shows the results of Phase 1: the hypotheses the LLM formed, the structured queries it generated, and the conversation evidence retrieved for each. Each query returns up to 10 conversations sorted by total bytes descending. A maximum of 5 queries are executed per generation; queries with no filters set are skipped automatically.
Privacy
Story generation sends conversation metadata (not raw packet payloads) to your
configured LLM_API_BASE_URL. Use a local LLM to keep this data fully
within your infrastructure.