File Extraction

File Extraction recovers objects transmitted over the network — images, documents, binaries, archives, and more — directly from PCAP payloads.

Enabling Extraction

By default file extraction runs automatically on every upload. If the deployment has VITE_ANALYSIS_OPTIONS=true set, an Analysis options modal appears after file selection where the Embedded file extraction stage can be unchecked to skip extraction for that upload.

Either way, extraction cannot be added retroactively — re-upload the file if you need extracted files for a capture processed without it.

Extraction Methods

TracePcap uses two complementary extraction techniques:

HTTP Object Extraction

The backend runs tshark --export-objects http,<tmpdir> to extract HTTP response bodies as files. A second tshark pass correlates each exported file back to its source conversation by matching the URI path component. The original filename (from the response URI or Content-Disposition header) and MIME type (from Content-Type) are preserved where available.

Stream Extraction (Aho-Corasick + Apache Tika)

For non-HTTP traffic, TracePcap reconstructs raw TCP/UDP stream payloads for candidate conversations (up to 50 streams per PCAP) and scans them using an Aho-Corasick multi-pattern search for known file magic byte sequences (e.g. %PDF-, PK\x03\x04 for ZIP, \xFF\xD8 for JPEG). Each candidate match position is then confirmed by Apache Tika, which performs a definitive magic-byte check. This O(n) approach replaces a sliding-window scan and keeps Tika calls proportional to actual matches rather than stream length.

A maximum of 5 files per stream are extracted to prevent runaway extraction on synthetic or binary-heavy payloads.

MIME Detection

Every extracted file — from either method — is passed through Apache Tika for content-based MIME type detection. This is independent of any filename extension or HTTP header, ensuring correct identification even when headers are absent or misleading. Tika also resolves the appropriate file extension from the detected MIME type.

Size Limit

Individual extracted files larger than 50 MB are discarded to avoid excessive MinIO storage consumption.

Viewing Extracted Files

Go to the Extracted Files tab for a PCAP. Each file is listed with:

  • Filename (original or auto-generated)

  • MIME type (Tika-detected)

  • Size

  • Source conversation (src IP : src port → dst IP : dst port)

  • Extraction method (tshark_http / stream)

MIME Type Filter

A collapsible Filters panel above the file list shows pill buttons for every MIME type present in the loaded files. Select one or more to narrow the list. The file count badge updates to X / Y files while a filter is active. Select All and Clear shortcuts appear in the pill header.

Media Preview

Files with a browser-natively playable MIME type (images, audio, video) show a Preview button. Clicking it opens an inline modal with the file rendered directly in the browser — no download required. The preview endpoint returns Content-Disposition: inline so the browser handles rendering.

Download Safety Disclaimer

Clicking Download on any extracted file shows a safety disclaimer modal before the download begins, reminding you that extracted files may be malicious.

Bulk Download

Select multiple files and click Download Selected to receive them in a ZIP archive. Individual files can be downloaded directly via the row action button.