How to Extract images from pdf?

To extract images form pdf you can use pdfimages tool form poppler-utils: pdfimages /path/to/file.pdf /path/to/output/image

Poppler is a set of utils for PDF manipulation, it llows also to extract images from PDF.

Extract images from PDFs with poppler

When you need to pull out image from a PDF file

Page content

Sometimes there is a real need to save image from pdf without screenshoting or rendering the pdf on the screen. pdfimages helps with that.

A comic image of a man is taking a photo of the diagram on the screen

To extract images from a PDF file in Linux, the most common and effective tool is the command-line utility pdfimages, which is part of the poppler-utils package.

Using `pdfimages` util in Linux: Step-by-Step Instructions

1. Install pdfimages (if not already installed):

Most Linux distributions include pdfimages by default. If needed, install it using your package manager:

sudo apt-get install poppler-utils

or for Fedora:

sudo dnf install poppler-utils

2. Open a Terminal:

Press Ctrl + Alt + T to open a terminal window.

3. Run pdfimages to Extract Images:

Basic syntax:

pdfimages

Example:

pdfimages /path/to/file.pdf /path/to/output/image

This will extract all images from file.pdf and save them as image-000.ppm, image-001.ppm, etc., in the specified output directory.

4. Extract Images as JPEG (if desired):

To extract images in JPEG format (when possible), use the -j option:

pdfimages -j /path/to/file.pdf /path/to/output/image

This will save JPEG images as .jpg files.

5. Extract Images from Specific Pages:

To extract images from a range of pages, use -f (first page) and -l (last page):

pdfimages -f 2 -l 5 -j /path/to/file.pdf /path/to/output/image

This extracts images from pages 2 to 5.

6. Additional Options:

To extract images as PNG: use -png (if supported by your version).
For password-protected PDFs, use -opw 'ownerpassword' or -upw 'userpassword'.

Notes

The default output format is PPM (color) or PBM (monochrome). Use -j for JPEG, or convert PPM/PBM files to other formats using tools like convert from ImageMagick if needed.
The output files are automatically numbered and saved in the specified directory.

Summary Table

Command Example	Description
`pdfimages input.pdf image`	Extracts all images in default PPM/PBM format
`pdfimages -j input.pdf image`	Extracts images as JPEG when possible
`pdfimages -f 3 -l 5 input.pdf image`	Extracts images from pages 3 to 5
`pdfimages -opw 'password' -j input.pdf image`	Extracts images from an owner-password protected PDF

This method is efficient and works for most PDFs containing embedded images. But:

pdfimages extracts images at their original resolution only.
To control output resolution, use a PDF renderer like PyMuPDF and specify the desired DPI when creating the image.

How to save images from PDF with specific resolution.

You cannot specify the resolution of extracted images when using pdfimages, because this tool extracts embedded images in their original format and resolution without resampling or altering quality. The resolution is determined by how the images were stored in the PDF, and pdfimages does not provide an option to upscale or downscale during extraction.

If you want to extract images at a specific resolution (for example, rendering a page or a portion of a page as an image at a chosen DPI), you need to use a PDF rendering library or tool such as PyMuPDF (fitz). With PyMuPDF, you can specify the desired resolution using the dpi parameter when rendering a page to an image:

import fitz  # PyMuPDF
doc = fitz.open("input.pdf")
page = doc.load_page(0)  # first page
pix = page.get_pixmap(dpi=300)  # render at 300 DPI
pix.save("output.png")

This approach creates a rasterized image of the page at the specified DPI, rather than extracting the original embedded images.

Other tools extracting images from PDFs

The best software for extracting images from PDFs without losing resolution are those that extract the original embedded images directly, rather than rendering or resampling them. The top choices include:

Adobe Acrobat Pro: Offers a dedicated “Export All Images” feature, which extracts images in their original quality and format as standalone files. This method is highly reliable and preserves the exact resolution and quality of the images as stored in the PDF.
pdfimages (from the [XPDF/Poppler suite(https://www.glukhov.org/post/2025/04/ubuntu-poppler/ “Pdf manipulating tools in Ubuntu - Poppler”)): A free, open-source command-line tool available on Linux and other platforms. pdfimages extracts all images from a PDF in their native format and resolution, including support for JPEG, JPEG2000, and other formats. It is widely recommended for users seeking a no-cost, high-fidelity extraction process.
Online tools (e.g., PDF24 Tools, PDFCandy, pdfforge): These services allow you to upload a PDF and download the extracted images, maintaining the original resolution. They are convenient for quick tasks and do not require installation, but may raise privacy concerns for sensitive documents.

Summary Table

Software/Tool	Platform	Maintains Original Resolution	Notes
Adobe Acrobat Pro	Windows/Mac	Yes	Paid, professional-grade, very reliable
pdfimages (Poppler)	Linux/Windows	Yes	Free, open-source, command-line utility
PDF24 Tools, PDFCandy	Web-based	Yes	Free, easy to use, privacy considerations

Key Point:
Always use tools that extract (not render or screenshot) the images. Both Adobe Acrobat Pro and pdfimages are industry standards for this purpose, ensuring the images are saved exactly as they exist in the PDF, without any loss of resolution.

pdfutils commanline options

When executing pdfimages /help it will print something like:

$ pdfimages /help
pdfimages version 24.02.0
Copyright 2005-2024 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
  -f <int>       : first page to convert
  -l <int>       : last page to convert
  -png           : change the default output format to PNG
  -tiff          : change the default output format to TIFF
  -j             : write JPEG images as JPEG files
  -jp2           : write JPEG2000 images as JP2 files
  -jbig2         : write JBIG2 images as JBIG2 files
  -ccitt         : write CCITT images as CCITT files
  -all           : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt
  -list          : print list of images instead of saving
  -opw <string>  : owner password (for encrypted files)
  -upw <string>  : user password (for encrypted files)
  -p             : include page numbers in output file names
  -q             : don't print any messages or errors
  -v             : print copyright and version info
  -h             : print usage information
  -help          : print usage information
  --help         : print usage information
  -?             : print usage information

Using pdfimages util in Linux: Step-by-Step Instructions