JPEG Format
Joint Photographic Experts Group
File Structure Overview
If you've ever opened a JPEG in a hex editor, you'll notice it's made up of segments separated by markers. Every marker starts with 0xFF, followed by a byte that tells you what kind of segment it is. This design makes JPEGs relatively easy to parse. You just scan for FF bytes and check what follows.
The interesting part for privacy is that all the metadata (EXIF, GPS, timestamps) lives in these segments, completely separate from the actual image data. That's why we can strip metadata without touching the pixels or recompressing anything.
JPEG File Structure
JPEG Header Example
| Offset | Hex | ASCII | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0000 | FFD8FFE000104A464946000101000048 | ......JFIF.....H | |||||||||||||||
| 0010 | 00480000FFE112344578696600000000 | .H.....4Exif.. | |||||||||||||||
Marker Structure
FF xx:Marker (xx identifies the type)LL LL:Length (2 bytes, big-endian, includes itself)Data...:Segment data (length - 2 bytes)
Complete JPEG Marker Reference
| Marker | Name | Description |
|---|---|---|
| FF D8 | SOI | Start of Image |
| FF D9 | EOI | End of Image |
| FF C0 | SOF0 | Start of Frame (Baseline DCT) |
| FF C2 | SOF2 | Start of Frame (Progressive DCT) |
| FF C4 | DHT | Define Huffman Table |
| FF DB | DQT | Define Quantization Table |
| FF DD | DRI | Define Restart Interval |
| FF DA | SOS | Start of Scan (image data follows) |
| FF FE | COM | Comment (metadata) |
| FF E0 | APP0 | JFIF marker |
| FF E1 | APP1 | EXIF / XMP marker |
| FF E2 | APP2 | ICC Profile / MPF |
| FF ED | APP13 | IPTC / Photoshop |
| FF EE | APP14 | Adobe color transform |
Red = metadata (removed), Yellow = optional preserve, Blue = required structure
JFIF vs EXIF
Here's something that confuses a lot of people: JPEG files often contain two different metadata standards at the same time. JFIF came first and handles basic stuff like resolution. Then cameras started adding EXIF data with all the detailed shooting information. Most modern JPEGs have both, which is why you'll see both APP0 and APP1 markers near the start of the file.
APP0:JFIF
JPEG File Interchange Format. Basic image information.
- • Version number
- • Pixel density (DPI)
- • Aspect ratio
- • Optional thumbnail
APP1:EXIF
Exchangeable Image File Format. Camera metadata.
- • Camera make/model
- • GPS coordinates
- • Timestamps
- • Exposure settings
APP Markers Deep Dive
The JPEG spec reserved 16 "application" markers (APP0 through APP15) for vendors to use however they wanted. Over the years, different organizations claimed different markers for their metadata formats. The result is a bit of a mess, but at least it's a well-documented mess.
The tricky part is that the same marker can mean different things depending on its content. APP1, for example, can be either EXIF or XMP data, so you have to look at the identifier string inside to know which one you're dealing with.
APP0 (0xFFE0):JFIF
Identifier: JFIF\0. Contains version, density units, X/Y density, and optional embedded thumbnail. Generally safe to preserve.
APP1 (0xFFE1):EXIF
Identifier: Exif\0\0. Contains a complete TIFF structure with IFD entries for camera info, GPS data, timestamps, thumbnails, and maker notes. Primary privacy concern.
APP1 (0xFFE1):XMP
Identifier: http://ns.adobe.com/xap/1.0/\0. XML-based metadata standard by Adobe. Can contain editing history, software info, and extended EXIF data.
XMP Namespaces
xmp: http://ns.adobe.com/xap/1.0/xmpMM: http://ns.adobe.com/xap/1.0/mm/dc: http://purl.org/dc/elements/1.1/photoshop: http://ns.adobe.com/photoshop/1.0/tiff: http://ns.adobe.com/tiff/1.0/exif: http://ns.adobe.com/exif/1.0/APP1 (0xFFE1):Extended XMP
Identifier: http://ns.adobe.com/xmp/extension/\0. For XMP data exceeding 65KB, split across multiple APP1 segments.
APP2 (0xFFE2):MPF
Identifier: MPF\0. Multi-Picture Format for stereoscopic images and panoramas.
APP2 (0xFFE2):ICC Profile
Identifier: ICC_PROFILE\0. Color management data. Can be chunked across multiple APP2 markers. Usually safe to preserve.
APP13 (0xFFED):IPTC/Photoshop
Identifier: Photoshop 3.0\0. Contains IPTC-IIM data with captions, keywords, copyright, creator info. May contain author information.
APP14 (0xFFEE):Adobe
Identifier: Adobe\0. Contains color transform flags (RGB/CMYK). Important for proper color rendering.
EXIF Structure
Here's where it gets interesting (and a bit weird). EXIF data is actually a tiny TIFF file embedded inside your JPEG. Yes, really. It has its own header, its own byte order, and uses TIFF's IFD (Image File Directory) structure to store tags.
The first thing you'll see after the "Exif" identifier is a TIFF header that tells you whether the data is little-endian (Intel, "II") or big-endian (Motorola, "MM"). Getting this wrong means all your numbers will be garbage, so it's the first thing any parser checks.
TIFF Header (8 bytes)
| Offset | Hex | ASCII | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 0000 | 49492A0008000000 | II*..... | |||||||
II = Intel (little-endian), MM = Motorola (big-endian)
IFD Entry Structure (12 bytes)
Key IFD Entries
0x010F:Make0x0110:Model0x0132:DateTime0x8769:ExifIFDPointer0x8825:GPSInfoIFDPointer0x927C:MakerNoteHow PicScrub Processes JPEG
The approach is pretty straightforward: we read through the file marker by marker, keep the ones that are essential for displaying the image, and skip the ones that contain metadata. Since we never touch the compressed image data, there's zero quality loss.
The key insight is that JPEG's marker-based structure makes this easy. We don't need to understand DCT coefficients or Huffman tables. We just need to identify which segments are metadata and which aren't.
Validate Header
Verify SOI marker (FF D8) at file start
Parse Markers
Read each marker sequentially, extracting type and length
Identify Metadata
Check identifiers in APP segments (EXIF, XMP, IPTC, etc.)
Selective Copy
Copy non-metadata segments to output, skip metadata segments
What's Preserved
- • SOI/EOI markers (required)
- • DQT (quantization tables)
- • SOF (frame header)
- • DHT (Huffman tables)
- • SOS and scan data (actual image)
- • Optional: ICC color profile (configurable)
What's Removed
- • APP1 EXIF (camera, GPS, timestamps)
- • APP1 XMP (editing metadata)
- • APP13 IPTC (captions, author)
- • COM markers (comments)
- • EXIF thumbnail images
EXIF Tag Reference
Not all EXIF tags are created equal when it comes to privacy. Some, like exposure settings, are pretty harmless. Others, like serial numbers and owner names, can uniquely identify you or your equipment. Here are the ones we consider high-risk:
| Tag ID | Name | Privacy Risk |
|---|---|---|
| 0x9003 | DateTimeOriginal | High |
| 0x9004 | DateTimeDigitized | High |
| 0x9010 | OffsetTime | Medium |
| 0x9286 | UserComment | High |
| 0x927C | MakerNote | High |
| 0xA420 | ImageUniqueID | High |
| 0xA430 | CameraOwnerName | High |
| 0xA431 | BodySerialNumber | High |
| 0xA434 | LensModel | Medium |
| 0xA435 | LensSerialNumber | High |
| 0xFDE8 | OwnerName | High |
| 0xFDE9 | SerialNumber | High |
View all camera setting tags (non-identifying)
0x829A ExposureTime0x829D FNumber0x8822 ExposureProgram0x8827 ISOSpeedRatings0x9201 ShutterSpeedValue0x9202 ApertureValue0x9203 BrightnessValue0x9204 ExposureBiasValue0x9207 MeteringMode0x9208 LightSource0x9209 Flash0x920A FocalLength0xA001 ColorSpace0xA002 PixelXDimension0xA003 PixelYDimension0xA402 ExposureMode0xA403 WhiteBalance0xA405 FocalLengthIn35mmFilm0xA406 SceneCaptureTypeGPS Tag Reference
GPS data lives in its own sub-IFD, pointed to by tag 0x8825 in the main IFD. If that pointer exists, there's location data in your image. Modern smartphones are particularly aggressive about embedding this, often with enough precision to pinpoint exactly where you were standing.
| Tag ID | Name | Description |
|---|---|---|
| 0x0000 | GPSVersionID | GPS tag version |
| 0x0001 | GPSLatitudeRef | N or S |
| 0x0002 | GPSLatitude | Degrees, minutes, seconds |
| 0x0003 | GPSLongitudeRef | E or W |
| 0x0004 | GPSLongitude | Degrees, minutes, seconds |
| 0x0005 | GPSAltitudeRef | Above/below sea level |
| 0x0006 | GPSAltitude | Meters |
| 0x0007 | GPSTimeStamp | UTC time |
| 0x001D | GPSDateStamp | UTC date |
| 0x0012 | GPSMapDatum | Geodetic survey data |
| 0x001B | GPSProcessingMethod | GPS/CELLID/WLAN/MANUAL |
| 0x001C | GPSAreaInformation | Location name |
IPTC Tag Reference
IPTC was designed for news agencies to embed captions, credits, and copyright info. It's stored inside APP13 with a "Photoshop 3.0" identifier (yes, Adobe adopted it). If you've ever added copyright info or keywords in Lightroom or Photoshop, this is where it ends up.
0x0250 By-line (Author)0x0255 By-line Title0x025A City0x025C Sub-location0x025F Province/State0x0265 Country Name0x0274 Copyright Notice0x0276 Contact0x0237 Date Created0x023C Time Created0x0278 Caption/Abstract0x0219 Keywords