Extract images from PDFs
When you need to pull out image from a PDF file
Sometimes there is a real need to save image from pdf without screenshoting or rendering the pdf on the screen.
pdfimages
helps with that.
To extract images from a PDF file in Linux, the most common and effective tool is the command-line utility pdfimages,
which is part of the poppler-utils
package.
Using pdfimages
util in Linux: Step-by-Step Instructions
1. Install pdfimages (if not already installed):
Most Linux distributions include pdfimages
by default. If needed, install it using your package manager:
sudo apt-get install poppler-utils
or for Fedora:
sudo dnf install poppler-utils
2. Open a Terminal:
Press Ctrl + Alt + T
to open a terminal window.
3. Run pdfimages to Extract Images:
Basic syntax:
pdfimages
Example:
pdfimages /path/to/file.pdf /path/to/output/image
- This will extract all images from
file.pdf
and save them asimage-000.ppm
,image-001.ppm
, etc., in the specified output directory.
4. Extract Images as JPEG (if desired):
To extract images in JPEG format (when possible), use the -j
option:
pdfimages -j /path/to/file.pdf /path/to/output/image
- This will save JPEG images as
.jpg
files.
5. Extract Images from Specific Pages:
- To extract images from a range of pages, use
-f
(first page) and-l
(last page):
pdfimages -f 2 -l 5 -j /path/to/file.pdf /path/to/output/image
- This extracts images from pages 2 to 5.
6. Additional Options:
- To extract images as PNG: use
-png
(if supported by your version). - For password-protected PDFs, use
-opw 'ownerpassword'
or-upw 'userpassword'
.
Notes
- The default output format is PPM (color) or PBM (monochrome). Use
-j
for JPEG, or convert PPM/PBM files to other formats using tools likeconvert
from ImageMagick if needed. - The output files are automatically numbered and saved in the specified directory.
Summary Table
Command Example | Description |
---|---|
pdfimages input.pdf image |
Extracts all images in default PPM/PBM format |
pdfimages -j input.pdf image |
Extracts images as JPEG when possible |
pdfimages -f 3 -l 5 input.pdf image |
Extracts images from pages 3 to 5 |
pdfimages -opw 'password' -j input.pdf image |
Extracts images from an owner-password protected PDF |
This method is efficient and works for most PDFs containing embedded images. But:
pdfimages
extracts images at their original resolution only.- To control output resolution, use a PDF renderer like PyMuPDF and specify the desired DPI when creating the image.
How to save images from PDF with specific resolution.
You cannot specify the resolution of extracted images when using pdfimages
, because this tool extracts embedded images in their original format and resolution without resampling or altering quality. The resolution is determined by how the images were stored in the PDF, and pdfimages
does not provide an option to upscale or downscale during extraction.
If you want to extract images at a specific resolution (for example, rendering a page or a portion of a page as an image at a chosen DPI), you need to use a PDF rendering library or tool such as PyMuPDF (fitz
). With PyMuPDF, you can specify the desired resolution using the dpi
parameter when rendering a page to an image:
import fitz # PyMuPDF
doc = fitz.open("input.pdf")
page = doc.load_page(0) # first page
pix = page.get_pixmap(dpi=300) # render at 300 DPI
pix.save("output.png")
This approach creates a rasterized image of the page at the specified DPI, rather than extracting the original embedded images.
Other tools extracting images from PDFs
The best software for extracting images from PDFs without losing resolution are those that extract the original embedded images directly, rather than rendering or resampling them. The top choices include:
-
Adobe Acrobat Pro: Offers a dedicated “Export All Images” feature, which extracts images in their original quality and format as standalone files. This method is highly reliable and preserves the exact resolution and quality of the images as stored in the PDF.
-
pdfimages (from the [XPDF/Poppler suite(https://www.glukhov.org/post/2025/04/ubuntu-poppler/ “Pdf manipulating tools in Ubuntu - Poppler”)): A free, open-source command-line tool available on Linux and other platforms.
pdfimages
extracts all images from a PDF in their native format and resolution, including support for JPEG, JPEG2000, and other formats. It is widely recommended for users seeking a no-cost, high-fidelity extraction process. -
Online tools (e.g., PDF24 Tools, PDFCandy, pdfforge): These services allow you to upload a PDF and download the extracted images, maintaining the original resolution. They are convenient for quick tasks and do not require installation, but may raise privacy concerns for sensitive documents.
Summary Table
Software/Tool | Platform | Maintains Original Resolution | Notes |
---|---|---|---|
Adobe Acrobat Pro | Windows/Mac | Yes | Paid, professional-grade, very reliable |
pdfimages (Poppler) | Linux/Windows | Yes | Free, open-source, command-line utility |
PDF24 Tools, PDFCandy | Web-based | Yes | Free, easy to use, privacy considerations |
Key Point:
Always use tools that extract (not render or screenshot) the images. Both Adobe Acrobat Pro and pdfimages
are industry standards for this purpose, ensuring the images are saved exactly as they exist in the PDF, without any loss of resolution.
pdfutils commanline options
When executing pdfimages /help
it will print something like:
$ pdfimages /help
pdfimages version 24.02.0
Copyright 2005-2024 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
-f <int> : first page to convert
-l <int> : last page to convert
-png : change the default output format to PNG
-tiff : change the default output format to TIFF
-j : write JPEG images as JPEG files
-jp2 : write JPEG2000 images as JP2 files
-jbig2 : write JBIG2 images as JBIG2 files
-ccitt : write CCITT images as CCITT files
-all : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt
-list : print list of images instead of saving
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-p : include page numbers in output file names
-q : don't print any messages or errors
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information