A boundary marker is a delimiter used in multipart data transmissions, particularly in HTTP responses, to separate individual sections within a stream. This technique is commonly used in MJPEG video feeds, live image transmissions, and multi-part file uploads where multiple objects need to be sent over a single HTTP connection. The boundary marker is explicitly defined in the HTTP headers and serves as a way to identify the start and end of each transmitted object within the stream.
To identify a boundary marker, you first need to look for Content-Type: multipart/x-mixed-replace in the HTTP headers. This indicates that the response contains multiple segments, each separated by a predefined boundary. Within the headers, you will typically see an entry like boundary=BoundaryString, where "BoundaryString" is the separator between individual content parts. When analysing the TCP stream, these boundary markers will appear before each image or file segment, commonly formatted as --BoundaryString, followed by metadata such as Content-Type: image/jpeg.
To extract data from a .pcap file, first, load the file into Wireshark. Applying the filter http contains "multipart" can help locate relevant packets within the capture. From there, using Follow TCP Stream will allow you to inspect the raw data for occurrences of the boundary marker. Once you have identified the boundary markers, the next step is to extract the content found between them. If images or files appear fragmented across multiple packets, you may need to reconstruct the entire HTTP response before correctly saving the extracted data.
The below is a python script for extracting an image from the PCAP file. Make sure the boundary string matches with yours. After running the script using capture.pcap, it will put the extracted images in the output_folder. Also dpkt module is needed. Install using pip then.
Edit the filename and output directory according to you needs.
import dpkt
import re
def extract_images_chunked(pcap_file, output_folder, boundary=b"--BoundaryString"):
image_count = 0
full_http_data = b""
with open(pcap_file, 'rb') as f:
pcap = dpkt.pcap.Reader(f)
for _, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
if isinstance(tcp, dpkt.tcp.TCP) and tcp.data:
full_http_data += tcp.data # Store full stream data
# Remove chunked encoding metadata
full_http_data = re.sub(rb"\r\n[0-9a-fA-F]+\r\n", b"", full_http_data)
# Split by boundaries
parts = full_http_data.split(boundary)
for part in parts:
if b"Content-type: image/jpeg" in part or b"Content-type: image/png" in part:
header_end = part.find(b"\r\n\r\n") + 4
img_binary = part[header_end:]
# Ensure valid JPEG/PNG data
if img_binary.startswith(b"\xff\xd8"): # JPEG Start Marker
filename = f"{output_folder}/image_{image_count}.jpg"
with open(filename, "wb") as img_file:
img_file.write(img_binary)
print(f"Saved: {filename}")
image_count += 1
print(f"Extracted {image_count} images!")
# Usage Example
extract_images_chunked("capture.pcap", "output_folder")
Git Repository: https://github.com/jaacostan/PCAP-Image-Extractor