Conversations
=============

The Conversations tab lists every network flow (conversation) extracted from
the PCAP file — one row per unique bidirectional flow.

How Conversations Are Built
----------------------------

Understanding exactly how TracePcap groups packets into conversations prevents
misreading the data.

Packet Extraction
~~~~~~~~~~~~~~~~~

TracePcap invokes ``tshark`` once per PCAP with 19 ``-e`` field selectors and
pipe-separated output:

.. code-block:: text

   tshark -r <file> -T fields -E separator=| \
     -e frame.time_epoch \
     -e frame.len \
     -e ip.src \
     -e ip.dst \
     -e ipv6.src \
     -e ipv6.dst \
     -e tcp.srcport \
     -e tcp.dstport \
     -e udp.srcport \
     -e udp.dstport \
     -e _ws.col.Protocol \
     -e _ws.col.Info \
     -e tcp.payload \
     -e udp.payload \
     -e ip.ttl \
     -e eth.src \
     -e arp.src.proto_ipv4 \
     -e arp.dst.proto_ipv4 \
     -e eth.dst

Fields (in order): Unix epoch timestamp, on-wire frame length, IPv4 src/dst,
IPv6 src/dst (fallback), TCP src/dst port, UDP src/dst port, Wireshark Protocol
display column, Wireshark Info display column, TCP payload bytes (colon-hex),
UDP payload bytes, IP TTL, Ethernet source MAC, ARP sender/target IPs,
Ethernet destination MAC.

For **IPv4** traffic, ``ip.src``/``ip.dst`` are used. For **IPv6**, the service
falls back to ``ipv6.src``/``ipv6.dst``. For **ARP** frames (no IP layer),
the IP addresses embedded in the ARP payload (``arp.src.proto_ipv4`` /
``arp.dst.proto_ipv4``) are used as node identifiers. For other non-IP Layer-2
frames (STP, LLDP, CDP), the Ethernet MAC addresses themselves become the node
identifiers since there are no IP addresses to extract.

If tshark returns comma-separated values for a field (which can happen with
tunnelled or multi-layer packets), only the **first** value is used.

The ``protocol`` field is ``_ws.col.Protocol`` uppercased and truncated to 20
characters. This is Wireshark's "Protocol" display column — the highest
protocol layer that Wireshark's dissectors recognised for that packet. It is
**not** always the same as the nDPI ``appName`` (see `Protocol vs Application`_
below).

Conversation Grouping Key
~~~~~~~~~~~~~~~~~~~~~~~~~

Packets are merged into a single conversation if they share the same
**direction-independent 5-tuple**. The key is computed as follows:

1. Compare ``srcIp`` and ``dstIp`` lexicographically.
2. If ``srcIp < dstIp``: the canonical form is ``srcIp:srcPort-dstIp:dstPort``.
3. If ``srcIp > dstIp``: swap, so the canonical form is ``dstIp:dstPort-srcIp:srcPort``.
4. If the IPs are equal (same-host loopback traffic): compare ports — the
   smaller port number goes first.

This means a packet from ``10.0.0.1:55000`` → ``10.0.0.2:443`` and a reply from
``10.0.0.2:443`` → ``10.0.0.1:55000`` are **counted in the same conversation
row**. The ``srcIp``/``dstIp`` shown in the UI are from the **first packet** that
created the conversation entry, not necessarily the initiating direction.

The full key format is: ``ip1:port1-ip2:port2-PROTOCOL``.

Packet Count and Byte Count
~~~~~~~~~~~~~~~~~~~~~~~~~~~

- **Packet count**: the number of tshark output lines (i.e., raw frames) that
  matched the conversation key. Both directions are counted together.
- **Byte count**: the sum of ``frame.len`` values for all packets in the
  conversation. ``frame.len`` is the **total on-wire frame length** including
  all headers (Ethernet, IP, TCP/UDP) and the payload. It reflects what
  actually appeared on the wire, not just the application-layer payload.

Start Time and End Time
~~~~~~~~~~~~~~~~~~~~~~~

- **Start time**: the ``frame.time_epoch`` of the first packet that matched this
  conversation's key, converted to the server's local timezone.
- **End time**: the ``frame.time_epoch`` of the last packet that matched this
  conversation's key.

These timestamps come directly from the PCAP frame timestamps, which are set
by the capturing host's clock at the moment each packet was recorded. They do
not reflect any clock synchronisation — if the capture machine's clock was
skewed, all timestamps will be skewed equally.

Payload Storage
~~~~~~~~~~~~~~~

For each packet, the first **64 bytes** of the TCP or UDP application payload
are stored as a lowercase hex string. This is used by:

- **Custom signature** ``payload_contains`` matching (searches each packet's
  stored hex in turn).
- **Session Reconstruction** (which runs a separate ``tshark -z follow`` pass
  to get the full reassembled stream — the 64-byte limit does not affect
  reconstruction).

Protocol vs Application
-----------------------

The Conversations table has two distinct protocol-related columns. Understanding
the difference is important for correctly interpreting the data:

``protocol`` — Wireshark Display Column Label
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sourced from ``_ws.col.Protocol`` (tshark's "Protocol" display column).

- This is the **transport/network-layer label** Wireshark assigns to the frame
  based on its dissector stack — e.g. ``TCP``, ``UDP``, ``ICMP``, ``TLS``,
  ``DNS``, ``HTTP``.
- It is set at **packet-parse time** from the first tshark pass, before nDPI
  runs.
- A subsequent enrichment pass (see ``tsharkProtocol`` below) refines this
  using the ``frame.protocols`` stack.
- Examples: ``TLS``, ``TCP``, ``DNS``, ``MDNS``, ``HTTP``, ``QUIC``.

``tsharkProtocol`` — Deepest Dissector Stack Label
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In a second tshark pass, ``TsharkEnrichmentService`` extracts the
``frame.protocols`` field for every packet. This field is the full protocol
dissector stack as a colon-separated string, e.g.:

.. code-block:: text

   eth:ethertype:ip:tcp:http
   eth:ethertype:ip:udp:dns
   eth:ethertype:ip:tcp:tls
   eth:ethertype:ip:tcp:data

The service takes the **rightmost (deepest)** component as the application-layer
label and uppercases it (``http`` → ``HTTP``). It discards:

- The known L4 transport proto (e.g. if L4 is TCP, a top-of-stack ``TCP`` is
  suppressed — it means Wireshark couldn't dissect further).
- Generic labels: ``DATA``, ``FRAME``, ``ETH``, ``ETHERNET``, ``SLL``, ``RAW``
  (these indicate Wireshark reached the end of its dissectors with no app-layer
  identification).

Across all packets in a conversation, the **most frequently seen** app-layer
label wins and is stored as ``tsharkProtocol``.

``appName`` — nDPI Application Identity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Set by the nDPI analysis pass (``NdpiService``). This is a **traffic
classification** result, not a dissector label. nDPI identifies the application
from patterns in the entire flow, even through encryption — e.g.:

- ``YouTube`` (HTTPS traffic to Google CDN with YouTube signatures)
- ``WhatsApp`` (encrypted WhatsApp media or voice)
- ``BitTorrent`` (even when using non-standard ports)
- ``TOR`` (characteristic TOR circuit patterns)

``appName`` and ``tsharkProtocol`` are **complementary**:

- ``tsharkProtocol = TLS``, ``appName = YouTube`` means: the transport is TLS
  (Wireshark can see the TLS handshake), and nDPI has classified the application
  as YouTube (from fingerprints within the encrypted stream).
- ``tsharkProtocol = QUIC``, ``appName = Unknown`` means: the transport is QUIC
  but nDPI could not identify the application.
- Both fields absent means the traffic was too short or too ambiguous for either
  system to classify.

Columns
-------

The column set is configurable via the **Column Picker** button. Default
columns include:

- Source IP / Destination IP — from the first packet creating the conversation
- Source Port / Destination Port
- Protocol — ``_ws.col.Protocol`` label (see `Protocol vs Application`_)
- Application (nDPI) — nDPI ``appName``
- Category (nDPI) — nDPI traffic category (e.g. ``Social Network``, ``Media``)
- Wireshark Protocol — ``tsharkProtocol`` from the ``frame.protocols`` stack
- Risk flags — nDPI risk identifiers (e.g. ``TLS Self Signed Certificate``)
- Country (src / dst) — from ipinfo.io or DB-IP MMDB lookup
- Device type (src / dst) — from the multi-signal device classifier
- Bytes transferred — sum of ``frame.len`` for all matched packets
- Packet count — number of matched frames (both directions)
- Start / end timestamp — from PCAP frame timestamps
- Custom signature matches — names of fired custom detection rules
- HTTP User-Agent — extracted from ``http.user_agent`` tshark field

Filtering
---------

The filter panel supports simultaneous filtering. Each filter section has a
clickable ⓘ icon that explains exactly what is being matched. Filters combine
with AND logic — a conversation must satisfy all active filters to be shown.

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Filter
     - What it matches
   * - **IP / Hostname**
     - Substring match (case-insensitive) against ``srcIp``, ``dstIp``, or the
       SNI hostname extracted by nDPI. Accepts partial IPs or hostnames.
   * - **Port**
     - Exact integer match against ``srcPort`` **or** ``dstPort``. Digits only.
   * - **Payload contains**
     - Searches the stored 64-byte payload hex of every packet in the
       conversation. Accepts: plain ASCII string (e.g. ``GET /admin``), hex
       with ``0x`` prefix (e.g. ``0x474554``), or space-separated hex bytes
       (e.g. ``47 45 54``).
   * - **Security risks only**
     - Toggle: shows only conversations that have at least one nDPI risk flag
       (the ``flowRisks`` array is non-empty).
   * - **Protocol** (pills)
     - The ``_ws.col.Protocol`` label — Wireshark's display column, representing
       the highest protocol layer its dissectors identified for each packet
       (e.g. TCP, UDP, TLS, HTTP, DNS). Note: filtering for ``TCP`` here will
       exclude packets Wireshark dissected further to ``HTTP`` or ``TLS``.
       Multiple selections are OR-matched.
   * - **Dissected Protocol** (pills)
     - The ``tsharkProtocol`` — deepest protocol Wireshark's dissectors decoded
       from the ``frame.protocols`` stack (e.g. TLS, HTTP, DNS, QUIC).
       Multiple selections are OR-matched.
   * - **Application** (pills)
     - The nDPI ``appName`` — application or service identified by deep packet
       inspection (e.g. YouTube, WhatsApp). Detection accuracy may vary; treat
       as indicative. Only present when nDPI analysis was enabled.
   * - **Category** (pills)
     - The nDPI traffic category (e.g. Web, Media, VPN, Social Network).
       Multiple selections are OR-matched.
   * - **File Types** (pills)
     - Shows only conversations containing at least one packet where a file
       magic-byte signature was detected in the stored 64-byte payload
       (e.g. PDF, ZIP, PNG).
   * - **Risk Type** (pills)
     - Individual nDPI risk flag names (e.g. ``clear_text_credentials``,
       ``tls_self_signed_certificate``). Multiple selections are OR-matched.
   * - **Custom Rules** (pills)
     - Custom detection rule names from ``signatures.yml`` that fired in this
       PCAP. Only rules that matched at least one conversation are shown.
       Severity quick-select buttons (critical/high/medium/low) select all
       rules of that severity level at once.
   * - **Country** (pills)
     - Country of external IP addresses (src or dst) from ipinfo.io (online)
       or DB-IP Lite (offline). Multiple selections are OR-matched.
   * - **Device Type** (pills)
     - Predicted device class (Router, Mobile, etc.) for either the source or
       destination IP. Based on the multi-signal classifier; custom signature
       overrides apply at 100% confidence.

Sorting
-------

Click any column header to sort ascending; click again for descending.
Multi-column sorting is supported — hold **Shift** and click a second column.

Pagination
----------

Results are paginated. The page size is configurable from 10 to 100 rows.

Conversation Detail Panel
--------------------------

Clicking a row opens the **Conversation Detail Panel**, which shows all
fields for a single conversation in one place. The fields displayed depend on
what data was available during analysis:

**Identity and endpoints**

- Source IP : Port and Destination IP : Port
- Destination hostname (SNI from TLS ClientHello, if available)
- **Device type badges** — shown next to each IP if the device classifier ran.
  Clicking the badge opens the **Device Classification Popup** (see below).
- **Country / ASN** — for external IPs; includes a clickable geo-source badge:

  - **ipinfo.io** (green badge) — looked up via the ipinfo.io API. Provides
    country, region, city, ASN, and organisation. Results are cached locally
    so the API is not called again for known IPs.
  - **Offline DB** (grey badge) — resolved from the bundled DB-IP Lite MMDB.
    Used when the app is offline or ipinfo.io is unreachable. Accuracy may be
    lower, especially for cloud-provider IP ranges. ASN is not available from
    this source.

**Protocol fields**

- **Protocol** — ``_ws.col.Protocol`` label (Wireshark display column).
- **Dissected Protocol** — ``tsharkProtocol`` from the deepest
  ``frame.protocols`` stack layer (see `Protocol vs Application`_).
- **Application** — nDPI ``appName`` (may be absent if nDPI was not enabled).

**Security fields**

- **Security Flags** — nDPI risk flags (e.g. ``tls_self_signed_certificate``,
  ``clear_text_credentials``). These are stored as normalized
  ``lowercase_underscore`` strings from nDPI's ``[Risk: ...]`` output.
- **Custom Rules** — names of fired custom signature rules, color-coded by
  severity (critical=red, high=orange, medium=amber, low=purple).

**TLS metadata** (when nDPI analysis was enabled and a TLS handshake was
observed)

- **JA3 Client** — MD5 hash of the TLS ClientHello parameters.
- **JA3S Server** — MD5 hash of the TLS ServerHello parameters.
- **TLS Issuer** — Issuer DN from the server certificate.
- **TLS Subject** — Subject DN from the server certificate.
- **Cert Valid From / Cert Valid To** — certificate validity dates from
  ``NotBefore`` / ``NotAfter`` fields. The "Valid To" date is highlighted in
  red and an **Expired** badge is shown if ``NotAfter < now`` at the time
  the page is viewed.

**HTTP metadata**

- **User-Agents** — distinct ``http.user_agent`` values extracted during the
  tshark enrichment pass, shown as a list.

**Statistics**

- Packet count, total bytes, start time.

**Packet table**

The lower section shows the individual stored packets for the conversation.
Each row has:

- Packet index (1-based within this conversation)
- Direction arrow (→ blue = client-to-server, ← green = server-to-client;
  direction is relative to the conversation's ``srcIp``)
- Timestamp from the PCAP frame
- Source IP:Port / Destination IP:Port
- Frame length (``frame.len``) in bytes — includes all headers
- **File Type** — if a magic-byte signature was detected in the first 64 bytes
  of the stored payload hex, the detected file type is shown as a badge
  (e.g. ``PDF``, ``ZIP``, ``PNG``). An **ASCII** badge appears if > 30% of
  the first 256 payload bytes are printable ASCII characters.
- Info — the ``_ws.col.Info`` tshark display column for the packet

Click a packet row to expand the **Hex Viewer**, which renders the stored
64-byte payload as both hex and ASCII side-by-side.

**Extracted files link**

If File Extraction was enabled at upload time and files were extracted from
this conversation's stream, a button shows the count and links to the
Extracted Files tab filtered to this conversation.

Device Classification Popup (Conversation Detail)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The device type badge next to each IP in the Conversation Detail Panel opens
a simplified classification popup scoped to this specific conversation:

- **Type** — whether this IP acted as ``Client`` (initiated) or ``Server``
  (received) in this conversation. For server-role IPs, the label uses a
  port-to-service mapping on the destination port (e.g. port 443 → ``HTTPS``).
  The note below explains: "Based on destination port N in this conversation".
- **Device** — same device type badge, signal bullets, and confidence progress
  bar as the Network Visualization popup (see :doc:`network-visualization`).
- **Role** — ``Client`` or ``Server`` badge with a one-line note.

This popup is distinct from the Network Visualization Classification popup in
that it is conversation-scoped rather than global (it does not show initiated/
received counts across all conversations).

Conversation Tracer
-------------------

The **Conversation Tracer** is opened from the conversation list via the tracer
icon. It provides a step-by-step replay of every packet in the conversation,
with an LLM-generated plain-English explanation for each packet.

Star-Graph Visualization
~~~~~~~~~~~~~~~~~~~~~~~~~

A star-graph SVG shows the traced host (center node, labelled "Host") and up
to 12 of its peer IPs arranged in a ring. The active traced peer is highlighted
with a solid blue edge; other peers appear as dashed lines.

An animated dot travels along the active edge on each step advance:

- **Blue dot** (→) — packet travelling client-to-server.
- **Green dot** (←) — packet travelling server-to-client.

Direction is determined relative to the conversation's stored ``srcIp``:
packets where ``packet.srcIp == conversation.srcIp`` are labelled ``CLIENT``
(outbound), others are ``SERVER`` (inbound).

Below the graph, a step summary line shows: direction arrow, protocol,
size in bytes, and a truncated version of the ``_ws.col.Info`` string for
the current packet.

LLM Packet Explanations
~~~~~~~~~~~~~~~~~~~~~~~~

For each packet, the LLM receives:

- Direction (``CLIENT->SERVER`` or ``SERVER->CLIENT``)
- Protocol (``_ws.col.Protocol`` label)
- Packet size in bytes
- Info string (``_ws.col.Info``) where present
- Up to 64 bytes of payload rendered as ASCII, with non-printable bytes
  replaced by ``.``. Only included if the payload contains at least 4
  consecutive printable ASCII characters. Encrypted payloads produce no
  readable ASCII and are excluded.

The LLM is asked to produce a 1-2 sentence plain-English explanation of what
is happening at that network step, in the context of the full conversation
(protocol, application name, endpoint IPs/ports).

The system caches LLM explanations per conversation (up to 500 entries in an
LRU cache), so switching steps or reopening the tracer does not make repeated
LLM calls.

**Works well for:** TCP handshakes, HTTP requests/responses, DNS queries, TLS
handshake phases — where the Info field or payload bytes are descriptive.

**Limited for:** encrypted traffic (TLS application data, RDP) where only
size and direction are available — explanations will be generic.

Packet List
~~~~~~~~~~~

The scrollable packet list below the controls shows all packets with: step
index, direction, protocol, size, timestamp (time component only), and
truncated Info string. Clicking any row jumps directly to that step.

Navigation controls: Previous / Next buttons, current step / total counter,
and a Play/Pause button that auto-advances at 1.5-second intervals.

Session Reconstruction
-----------------------

Click the **eye icon** on any row to open the session reconstruction viewer
for that conversation — see :doc:`session-reconstruction`.

Export Options
--------------

- **Per-conversation PCAP** — download a PCAP containing only the packets
  for a single conversation via the row action menu.
- **Bulk PCAP export** — select multiple rows (or all) and export them as
  a combined PCAP.
- **CSV export** — export the current filtered and sorted view to CSV.