When you need to convert PDF files to images on a Linux server, pdftoppm (from the Poppler utilities) is a fast and reliable tool. In this post, weβll look at how to invoke pdftoppm from Elixir and how to run multiple conversions in parallel to improve throughput.
Installing pdftoppm
On most Linux distributions, pdftoppm is part of the poppler-utils package, on macOS, it's simply poppler.
# Debian / Ubuntu
sudo apt install poppler-utils
# Alpine
apk add poppler-utils
# macOS
brew install poppler
You can verify the installation with:
pdftoppm -h
Basic pdftoppm usage
To convert a PDF to JPEG images at 150 DPI:
pdftoppm -jpeg -r 150 input.pdf output/page
This produces files like:
output/page-1.jpg
output/page-2.jpg
Each page becomes a separate image.
Writing image data to stdout with pdftoppm
In some setups it is useful to avoid temporary files and let pdftoppm write the rendered image directly to stdout. From Elixir, you can then capture that output and persist it yourself. This post shows how to do this cleanly, while keeping stdout and stderr separated so errors are easy to handle.
pdftoppm writes images to files by default, but if you don't pass the PPM-file-prefix it will write the image data to stdout.
To render a single page as JPEG to stdout:
pdftoppm -jpeg -r 150 -f 1 -l 1 -jpegopt quality=85 -aa yes -aaVector yes input.pdf
On success:
stdoutcontains the binary JPEG datastderris empty
On failure:
stdoutis emptystderrcontains the error message
This makes it a good fit for piping and programmatic use.
Why System.cmd/3 is not enough
System.cmd/3 can redirect stderr to stdout, but it cannot capture them separately. Since we explicitly want:
- image data from
stdout - error messages from
stderr
we need to use a Port.
Converting a single page from Elixir
The function below renders a single page to JPEG, saves the image to disk, and returns structured errors when something goes wrong.
defmodule PdfToImage do
def convert_page(pdf_path, page, output_file, opts \\ []) do
dpi = Keyword.get(opts, :dpi, 150)
args = [
"pdftoppm",
"-jpeg",
"-jpegopt", "quality=85",
"-aa", "yes",
"-aaVector", "yes",
"-r", to_string(dpi),
"-f", to_string(page),
"-l", to_string(page),
pdf_path
]
port =
Port.open(
{:spawn_executable, System.find_executable("pdftoppm")},
[:binary, :exit_status, args: tl(args)]
)
collect_output(port, output_file, <<>>, <<>>)
end
defp collect_output(port, output_file, stdout, stderr) do
receive do
{^port, {:data, data}} ->
collect_output(port, output_file, stdout <> data, stderr)
{^port, {:exit_status, 0}} ->
File.write!(output_file, stdout)
:ok
{^port, {:exit_status, status}} ->
{:error, {status, stderr}}
after
30_000 ->
Port.close(port)
{:error, :timeout}
end
end
end
Usage:
PdfToImage.convert_page(
"input.pdf",
1,
"output/page-1.jpg",
dpi: 200
)
Parallelizing page conversion
Because each page conversion is independent, this approach works well with Task.async_stream/3.
pages = 1..10
Task.async_stream(
pages,
fn page ->
PdfToImage.convert_page(
"input.pdf",
page,
"output/page-#{page}.jpg"
)
end,
max_concurrency: System.schedulers_online(),
timeout: :infinity
)
|> Enum.to_list()
Each task spawns its own pdftoppm process, captures binary image data from stdout, and only writes a file once rendering succeeds.
Error handling characteristics
- On success, only
stdoutis used and written to disk - On failure, no file is created
- The returned error contains the full
stderroutput frompdftoppm - This makes it suitable for background jobs and structured logging
Conclusion
By letting pdftoppm write image data to stdout and capturing it via a Port, you gain full control over I/O, error handling, and parallel execution. This avoids temporary files, keeps failure cases clean, and integrates well with Elixirβs concurrency primitives.
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.