PicScrub logoPicScrub
Back to Formats

JPEG Format

Joint Photographic Experts Group

File Structure Overview

If you've ever opened a JPEG in a hex editor, you'll notice it's made up of segments separated by markers. Every marker starts with 0xFF, followed by a byte that tells you what kind of segment it is. This design makes JPEGs relatively easy to parse. You just scan for FF bytes and check what follows.

The interesting part for privacy is that all the metadata (EXIF, GPS, timestamps) lives in these segments, completely separate from the actual image data. That's why we can strip metadata without touching the pixels or recompressing anything.

JPEG File Structure

SOI
APP0-15
DQT
SOF
DHT
SOS + Data
EOI
SOI
2 bytes
Start of Image (FF D8)
APP0-15
100 bytes
Application markers (metadata)
DQT
130 bytes
Quantization tables
SOF
20 bytes
Start of Frame
DHT
420 bytes
Huffman tables
SOS + Data
Variable length
Scan data (image)
EOI
2 bytes
End of Image (FF D9)

JPEG Header Example

SOI (Start of Image)
APP0 Marker
APP0 Length
JFIF Identifier
JFIF Data
APP1 Marker
APP1 Length
Exif Identifier
OffsetHexASCII
0000
FFD8FFE000104A464946000101000048
......JFIF.....H
0010
00480000FFE112344578696600000000
.H.....4Exif..

Marker Structure

  • FF xx:Marker (xx identifies the type)
  • LL LL:Length (2 bytes, big-endian, includes itself)
  • Data...:Segment data (length - 2 bytes)

Complete JPEG Marker Reference

MarkerNameDescription
FF D8SOIStart of Image
FF D9EOIEnd of Image
FF C0SOF0Start of Frame (Baseline DCT)
FF C2SOF2Start of Frame (Progressive DCT)
FF C4DHTDefine Huffman Table
FF DBDQTDefine Quantization Table
FF DDDRIDefine Restart Interval
FF DASOSStart of Scan (image data follows)
FF FECOMComment (metadata)
FF E0APP0JFIF marker
FF E1APP1EXIF / XMP marker
FF E2APP2ICC Profile / MPF
FF EDAPP13IPTC / Photoshop
FF EEAPP14Adobe color transform

Red = metadata (removed), Yellow = optional preserve, Blue = required structure

JFIF vs EXIF

Here's something that confuses a lot of people: JPEG files often contain two different metadata standards at the same time. JFIF came first and handles basic stuff like resolution. Then cameras started adding EXIF data with all the detailed shooting information. Most modern JPEGs have both, which is why you'll see both APP0 and APP1 markers near the start of the file.

APP0:JFIF

JPEG File Interchange Format. Basic image information.

  • • Version number
  • • Pixel density (DPI)
  • • Aspect ratio
  • • Optional thumbnail

APP1:EXIF

Exchangeable Image File Format. Camera metadata.

  • • Camera make/model
  • • GPS coordinates
  • • Timestamps
  • • Exposure settings

APP Markers Deep Dive

The JPEG spec reserved 16 "application" markers (APP0 through APP15) for vendors to use however they wanted. Over the years, different organizations claimed different markers for their metadata formats. The result is a bit of a mess, but at least it's a well-documented mess.

The tricky part is that the same marker can mean different things depending on its content. APP1, for example, can be either EXIF or XMP data, so you have to look at the identifier string inside to know which one you're dealing with.

APP0 (0xFFE0):JFIF

Identifier: JFIF\0. Contains version, density units, X/Y density, and optional embedded thumbnail. Generally safe to preserve.

APP1 (0xFFE1):EXIF

Identifier: Exif\0\0. Contains a complete TIFF structure with IFD entries for camera info, GPS data, timestamps, thumbnails, and maker notes. Primary privacy concern.

APP1 (0xFFE1):XMP

Identifier: http://ns.adobe.com/xap/1.0/\0. XML-based metadata standard by Adobe. Can contain editing history, software info, and extended EXIF data.

XMP Namespaces
xmp: http://ns.adobe.com/xap/1.0/
xmpMM: http://ns.adobe.com/xap/1.0/mm/
dc: http://purl.org/dc/elements/1.1/
photoshop: http://ns.adobe.com/photoshop/1.0/
tiff: http://ns.adobe.com/tiff/1.0/
exif: http://ns.adobe.com/exif/1.0/

APP1 (0xFFE1):Extended XMP

Identifier: http://ns.adobe.com/xmp/extension/\0. For XMP data exceeding 65KB, split across multiple APP1 segments.

APP2 (0xFFE2):MPF

Identifier: MPF\0. Multi-Picture Format for stereoscopic images and panoramas.

APP2 (0xFFE2):ICC Profile

Identifier: ICC_PROFILE\0. Color management data. Can be chunked across multiple APP2 markers. Usually safe to preserve.

APP13 (0xFFED):IPTC/Photoshop

Identifier: Photoshop 3.0\0. Contains IPTC-IIM data with captions, keywords, copyright, creator info. May contain author information.

APP14 (0xFFEE):Adobe

Identifier: Adobe\0. Contains color transform flags (RGB/CMYK). Important for proper color rendering.

EXIF Structure

Here's where it gets interesting (and a bit weird). EXIF data is actually a tiny TIFF file embedded inside your JPEG. Yes, really. It has its own header, its own byte order, and uses TIFF's IFD (Image File Directory) structure to store tags.

The first thing you'll see after the "Exif" identifier is a TIFF header that tells you whether the data is little-endian (Intel, "II") or big-endian (Motorola, "MM"). Getting this wrong means all your numbers will be garbage, so it's the first thing any parser checks.

TIFF Header (8 bytes)

Byte Order (II = Little Endian)
Magic Number (42)
IFD0 Offset
OffsetHexASCII
0000
49492A0008000000
II*.....

II = Intel (little-endian), MM = Motorola (big-endian)

IFD Entry Structure (12 bytes)

Tag IDTypeCountValue/Offset
2 bytes2 bytes4 bytes4 bytes

Key IFD Entries

0x010F:Make
0x0110:Model
0x0132:DateTime
0x8769:ExifIFDPointer
0x8825:GPSInfoIFDPointer
0x927C:MakerNote

How PicScrub Processes JPEG

The approach is pretty straightforward: we read through the file marker by marker, keep the ones that are essential for displaying the image, and skip the ones that contain metadata. Since we never touch the compressed image data, there's zero quality loss.

The key insight is that JPEG's marker-based structure makes this easy. We don't need to understand DCT coefficients or Huffman tables. We just need to identify which segments are metadata and which aren't.

1

Validate Header

Verify SOI marker (FF D8) at file start

2

Parse Markers

Read each marker sequentially, extracting type and length

3

Identify Metadata

Check identifiers in APP segments (EXIF, XMP, IPTC, etc.)

4

Selective Copy

Copy non-metadata segments to output, skip metadata segments

What's Preserved

  • • SOI/EOI markers (required)
  • • DQT (quantization tables)
  • • SOF (frame header)
  • • DHT (Huffman tables)
  • • SOS and scan data (actual image)
  • • Optional: ICC color profile (configurable)

What's Removed

  • • APP1 EXIF (camera, GPS, timestamps)
  • • APP1 XMP (editing metadata)
  • • APP13 IPTC (captions, author)
  • • COM markers (comments)
  • • EXIF thumbnail images

EXIF Tag Reference

Not all EXIF tags are created equal when it comes to privacy. Some, like exposure settings, are pretty harmless. Others, like serial numbers and owner names, can uniquely identify you or your equipment. Here are the ones we consider high-risk:

Tag IDNamePrivacy Risk
0x9003DateTimeOriginalHigh
0x9004DateTimeDigitizedHigh
0x9010OffsetTimeMedium
0x9286UserCommentHigh
0x927CMakerNoteHigh
0xA420ImageUniqueIDHigh
0xA430CameraOwnerNameHigh
0xA431BodySerialNumberHigh
0xA434LensModelMedium
0xA435LensSerialNumberHigh
0xFDE8OwnerNameHigh
0xFDE9SerialNumberHigh
View all camera setting tags (non-identifying)
0x829A ExposureTime
0x829D FNumber
0x8822 ExposureProgram
0x8827 ISOSpeedRatings
0x9201 ShutterSpeedValue
0x9202 ApertureValue
0x9203 BrightnessValue
0x9204 ExposureBiasValue
0x9207 MeteringMode
0x9208 LightSource
0x9209 Flash
0x920A FocalLength
0xA001 ColorSpace
0xA002 PixelXDimension
0xA003 PixelYDimension
0xA402 ExposureMode
0xA403 WhiteBalance
0xA405 FocalLengthIn35mmFilm
0xA406 SceneCaptureType

GPS Tag Reference

GPS data lives in its own sub-IFD, pointed to by tag 0x8825 in the main IFD. If that pointer exists, there's location data in your image. Modern smartphones are particularly aggressive about embedding this, often with enough precision to pinpoint exactly where you were standing.

Tag IDNameDescription
0x0000GPSVersionIDGPS tag version
0x0001GPSLatitudeRefN or S
0x0002GPSLatitudeDegrees, minutes, seconds
0x0003GPSLongitudeRefE or W
0x0004GPSLongitudeDegrees, minutes, seconds
0x0005GPSAltitudeRefAbove/below sea level
0x0006GPSAltitudeMeters
0x0007GPSTimeStampUTC time
0x001DGPSDateStampUTC date
0x0012GPSMapDatumGeodetic survey data
0x001BGPSProcessingMethodGPS/CELLID/WLAN/MANUAL
0x001CGPSAreaInformationLocation name

IPTC Tag Reference

IPTC was designed for news agencies to embed captions, credits, and copyright info. It's stored inside APP13 with a "Photoshop 3.0" identifier (yes, Adobe adopted it). If you've ever added copyright info or keywords in Lightroom or Photoshop, this is where it ends up.

0x0250 By-line (Author)
0x0255 By-line Title
0x025A City
0x025C Sub-location
0x025F Province/State
0x0265 Country Name
0x0274 Copyright Notice
0x0276 Contact
0x0237 Date Created
0x023C Time Created
0x0278 Caption/Abstract
0x0219 Keywords