Story Mode
==========

Story Mode combines **deterministic detectors** with an **LLM narrative
generator** to produce a rich, structured analysis of the network activity
captured in a PCAP file.

Requirements
------------

A configured LLM server is required for the narrative and Q&A features (see
:doc:`../configuration/llm-setup`). The deterministic findings panels work
without an LLM.

How Story Generation Works
--------------------------

Story generation is a **two-phase LLM pipeline** preceded by fully
deterministic pre-computation. A new generation always replaces any previously
stored story for the same file.

Phase 0 — Deterministic pre-computation (no LLM)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Before any LLM call is made, the backend runs two independent computations
over the **full conversation dataset**:

1. **Detector pipeline** — eight detectors run in sequence and produce a
   typed, severity-sorted list of findings (see `Deterministic Findings`_
   below).
2. **Aggregates** — statistics covering the full dataset are pre-computed:
   coverage counts, top external ASNs, protocol risk matrix, TLS anomaly
   summary, beacon candidates, and unknown-app percentage (see `Aggregates
   Panel`_ below).

Additionally, up to **50 timeline bins** (time-bucketed packet and byte
counts) are fetched from the timeline service to provide temporal context.

Phase 1 — Hypothesis and query generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The LLM receives a structured prompt containing:

- File and capture metadata (filename, size, packet count, bytes, duration,
  start/end times, total conversation count)
- Protocol breakdown (per-protocol packet count, bytes, and percentage)
- Traffic category breakdown (nDPI category distribution)
- Deterministic findings (up to 20 by default, ordered by severity)
- Full-dataset aggregates (unknown-app %, top external ASNs, protocol risk
  matrix, TLS anomaly summary, beacon candidates)
- Traffic timeline (up to 50 time-window rows)
- Optional analyst-supplied additional context

The LLM's sole job in this phase is to produce up to **5 testable hypotheses**
paired with **structured database queries** to investigate the most suspicious
activity. Each query specifies filters such as ``srcIp``, ``dstIp``,
``dstPort``, ``protocol``, ``appName``, ``category``, ``hasRisks``,
``hasTlsAnomaly``, ``riskType``, ``minBytes``, ``maxBytes``, and ``minFlows``.
Catch-all queries (no filters set) are automatically discarded. Note that
``minBytes`` and ``maxBytes`` are per-conversation byte counts and are silently
ignored by the backend when ``srcIp`` or ``riskType`` is also present in the
same query.

Phase 2 — Narrative generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each query from Phase 1 is executed against the database. Up to **10
conversations** (sorted by total bytes descending) are returned per query as
evidence. The LLM then receives everything from Phase 1 **plus** the
investigation results and writes the final narrative.

If Phase 1 fails for any reason (e.g. LLM error, context-length exceeded),
the pipeline falls back to generating the narrative directly from the
deterministic findings without investigation steps.

Context-length retry
~~~~~~~~~~~~~~~~~~~~

If the LLM rejects the prompt due to context length, the UI presents the
auto-built prompt for the analyst to trim before resubmitting. On retry, the
edited prompt is sent directly to the LLM (Phase 1 is re-run to preserve
investigation steps, but the narrative prompt itself is not rebuilt).

Known analysis limitations (embedded in every prompt)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The LLM is explicitly told the following constraints at generation time:

- Packet payloads and HTTP bodies are not available.
- DNS query names and TLS SNI are not captured.
- Benign (non-risk) conversations are not individually listed.

These limitations also bound what the LLM can reliably state in its output.

What Story Mode Produces
------------------------

Story Mode returns a response containing several components:

Deterministic Findings
~~~~~~~~~~~~~~~~~~~~~~

Before the LLM is invoked, a pipeline of **detector algorithms** runs over
the conversation data and produces typed findings. Each finding has a
severity (``CRITICAL``, ``HIGH``, ``MEDIUM``, ``LOW``), a title, a summary,
affected IPs, and numeric metrics. Detectors include:

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Detector
     - What it detects
   * - **NdpiRisk**
     - Surfaces nDPI risk flags as findings, one finding per distinct risk
       type. Severity is determined by the risk type name:

       - ``CRITICAL``: ``possible_exploit_detected``,
         ``binary_application_transfer``, ``clear_text_credentials``,
         ``suspicious_entropy``
       - ``HIGH``: ``suspicious_dns_traffic``, ``dns_suspicious_traffic``,
         ``malicious_sha1_certificate``, ``malformed_packet``
       - ``MEDIUM``: ``self_signed_certificate``, ``obsolete_tls_version``,
         ``weak_tls_cipher``, ``tls_certificate_about_to_expire``
       - ``LOW``: all other risk types
   * - **Beacon**
     - Identifies periodic/beaconing traffic by computing the coefficient of
       variation (CV) of inter-flow intervals. Flows with ≥3 repetitions,
       mean interval ≥1 second, and CV <0.3 are flagged. CV <0.1 →
       ``CRITICAL``; CV 0.1–0.3 → ``HIGH``. Up to 5 beacons are reported,
       sorted by lowest CV (most periodic first).
   * - **TlsAnomaly**
     - Detects self-signed certificates (issuer DN == subject DN), expired
       certificates (not-after < now), and certificates from unknown/untrusted
       CAs. Severity: ``HIGH`` for self-signed/expired; ``MEDIUM`` for unknown
       CA.
   * - **Volume**
     - Two independent checks per source IP:

       1. **Top talker** — flags any host accounting for >40% of total capture
          bytes (``MEDIUM``).
       2. **High outbound volume** — flags any host that sent ≥10 MB across
          its outbound flows (``MEDIUM``; ``HIGH`` if >100 MB).

       If both conditions fire for the same source IP, only the
       higher-severity finding is kept.
   * - **FanOut**
     - Flags hosts that contacted many distinct destination IPs. The minimum
       threshold to trigger a finding is >5 distinct destinations. >50 distinct
       destinations → ``HIGH``; 6–50 → ``MEDIUM``. Pattern is consistent
       with scanning or lateral movement.
   * - **LongSession**
     - Flags individual conversations lasting longer than **15 minutes**.
       >1 hour → ``HIGH``; 15 min–1 hour → ``MEDIUM``.
   * - **UnknownApp**
     - Flags captures where ≥5% of conversations could not be identified by
       nDPI. >30% → ``HIGH``; >10% → ``MEDIUM``; 5–10% → ``LOW``.
   * - **PortProtocolMismatch**
     - Flags nDPI-identified applications running on non-standard ports.
       Always ``HIGH`` severity. Monitored applications and their expected
       ports:

       - DNS: 53
       - HTTP: 80, 8080, 8000, 8888
       - HTTPS: 443, 8443
       - FTP: 20, 21
       - SSH: 22
       - SMTP: 25, 465, 587
       - IMAP: 143, 993
       - RDP: 3389
       - TELNET: 23

Findings are sorted by severity (``CRITICAL`` first) then by detector type
for stable ordering. Up to 20 findings (by default) are included in the LLM
prompt; all findings are returned to the UI regardless.

LLM Narrative
~~~~~~~~~~~~~

The LLM writes a multi-section narrative from the pre-computed findings and
investigation evidence. The output structure is enforced by the system prompt:

- **Narrative sections**: a ``summary`` section first, then ``detail``
  sections per major finding cluster, ending with a ``conclusion`` containing
  recommendations.
- **Highlights**: every ``CRITICAL`` finding must appear as an ``anomaly``
  highlight; ``HIGH`` → ``warning``; ``MEDIUM``/``LOW`` → ``insight``.
- **Timeline events**: ``CRITICAL`` findings → ``critical`` event type;
  ``HIGH`` → ``suspicious``; ``MEDIUM``/``LOW`` → ``normal``.
- **Suggested questions**: exactly 3, specific to the actual findings.

Story Timeline
~~~~~~~~~~~~~~

A visual timeline is derived from the narrative output, showing key events
in chronological order.

Aggregates Panel
~~~~~~~~~~~~~~~~

Pre-computed statistics over the **full conversation dataset** are displayed
in the **Traffic Intelligence** panel. These are computed independently of
the LLM and reflect all conversations in the PCAP, not just the evidence
sample sent to the LLM.

The panel contains:

- **Coverage banner** — total flows, total packets, percentage of flows
  flagged as at-risk (coloured green/yellow/red at 0%/10% thresholds),
  unknown-app percentage, and TLS anomaly count.
- **Top External Destinations** — up to 7 external ASNs/orgs ranked by
  outbound bytes, with flow count, data volume, and percentage of total bytes.
- **Protocol Risk Overview** — per-protocol breakdown of total conversation
  count vs. at-risk count. A conversation is *at risk* if its ``flow_risks``
  array (populated by nDPI) is non-empty. The progress bar turns red when
  >30% of that protocol's flows are at risk, yellow otherwise.
- **TLS Certificate Health** — aggregate counts of self-signed, expired, and
  unknown-CA TLS flows.
- **Beacon Candidates** — up to 5 flows exhibiting periodic behaviour (CV
  <0.3, ≥3 flows, mean interval ≥1 s), sorted by CV ascending.

The ⓘ icon in the panel header reveals important caveats:

- **Beacon detection** uses the coefficient of variation (CV) of inter-arrival
  times across flows to the same destination. A low CV (< 0.1) suggests highly
  regular, automated traffic. Short captures may produce false positives;
  legitimate software (NTP, telemetry agents) can appear beacon-like.
- **TLS health** is based on certificate issuer metadata (issuer DN, subject DN,
  NotBefore, NotAfter) extracted at analysis time. Certificates are **not**
  re-validated at display time — an expired certificate shown here reflects the
  certificate's own stated expiry date, not a live OCSP/CRL check.
- **ASN / geo data** is enriched via ipinfo.io at analysis time and may not
  reflect recent IP address reassignments.

Interactive LLM Q&A Chat
~~~~~~~~~~~~~~~~~~~~~~~~~

After the narrative is generated, a **chat panel** allows follow-up questions
about the PCAP. The LLM answers using the full story JSON as context and
returns 3 suggested follow-up questions with each answer.

Investigation Panel
~~~~~~~~~~~~~~~~~~~

The **Investigation** panel shows the results of Phase 1: the hypotheses the
LLM formed, the structured queries it generated, and the conversation evidence
retrieved for each. Each query returns up to 10 conversations sorted by total
bytes descending. A maximum of 5 queries are executed per generation; queries
with no filters set are skipped automatically.

Privacy
-------

Story generation sends conversation metadata (not raw packet payloads) to your
configured ``LLM_API_BASE_URL``. Use a local LLM to keep this data fully
within your infrastructure.