File Extraction =============== File Extraction recovers objects transmitted over the network — images, documents, binaries, archives, and more — directly from PCAP payloads. Enabling Extraction ------------------- By default file extraction runs automatically on every upload. If the deployment has ``VITE_ANALYSIS_OPTIONS=true`` set, an **Analysis options** modal appears after file selection where the **Embedded file extraction** stage can be unchecked to skip extraction for that upload. Either way, extraction cannot be added retroactively — re-upload the file if you need extracted files for a capture processed without it. Extraction Methods ------------------ TracePcap uses two complementary extraction techniques: HTTP Object Extraction ~~~~~~~~~~~~~~~~~~~~~~ The backend runs ``tshark --export-objects http,`` to extract HTTP response bodies as files. A second tshark pass correlates each exported file back to its source conversation by matching the URI path component. The original filename (from the response URI or ``Content-Disposition`` header) and MIME type (from ``Content-Type``) are preserved where available. Stream Extraction (Aho-Corasick + Apache Tika) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For non-HTTP traffic, TracePcap reconstructs raw TCP/UDP stream payloads for candidate conversations (up to 50 streams per PCAP) and scans them using an **Aho-Corasick** multi-pattern search for known file magic byte sequences (e.g. ``%PDF-``, ``PK\x03\x04`` for ZIP, ``\xFF\xD8`` for JPEG). Each candidate match position is then confirmed by **Apache Tika**, which performs a definitive magic-byte check. This O(n) approach replaces a sliding-window scan and keeps Tika calls proportional to actual matches rather than stream length. A maximum of 5 files per stream are extracted to prevent runaway extraction on synthetic or binary-heavy payloads. MIME Detection -------------- Every extracted file — from either method — is passed through **Apache Tika** for content-based MIME type detection. This is independent of any filename extension or HTTP header, ensuring correct identification even when headers are absent or misleading. Tika also resolves the appropriate file extension from the detected MIME type. Size Limit ---------- Individual extracted files larger than **50 MB** are discarded to avoid excessive MinIO storage consumption. Viewing Extracted Files ----------------------- Go to the **Extracted Files** tab for a PCAP. Each file is listed with: - Filename (original or auto-generated) - MIME type (Tika-detected) - Size - Source conversation (src IP : src port → dst IP : dst port) - Extraction method (``tshark_http`` / stream) MIME Type Filter ~~~~~~~~~~~~~~~~ A collapsible **Filters** panel above the file list shows pill buttons for every MIME type present in the loaded files. Select one or more to narrow the list. The file count badge updates to ``X / Y files`` while a filter is active. **Select All** and **Clear** shortcuts appear in the pill header. Media Preview ~~~~~~~~~~~~~ Files with a browser-natively playable MIME type (images, audio, video) show a **Preview** button. Clicking it opens an inline modal with the file rendered directly in the browser — no download required. The preview endpoint returns ``Content-Disposition: inline`` so the browser handles rendering. Download Safety Disclaimer ~~~~~~~~~~~~~~~~~~~~~~~~~~ Clicking **Download** on any extracted file shows a safety disclaimer modal before the download begins, reminding you that extracted files may be malicious. Bulk Download ------------- Select multiple files and click **Download Selected** to receive them in a ZIP archive. Individual files can be downloaded directly via the row action button.