Iterating over all content of all pages? #1160

polardune-dev · 2025-05-12T21:21:12Z

polardune-dev
May 12, 2025

I would like to use pdfcpu for doing my own analysis of all the content of a PDF, but it seems the API does not support this kind of low-level PDF processing, or am I overlooking something?

Does the API allow me to do something like:

for each page
- for all page content
  - iterate over all the operations

Answered by hhrutter

May 13, 2025

Page content stream internals are not modelled by pdfcpu as page content is not really processed other than for resource optimization.

View full answer

hhrutter · 2025-05-13T07:12:28Z

hhrutter
May 13, 2025
Maintainer

I have no idea what you are up to but you can do anything you want.
There is plenty of iterating over pages in the code base.

1 reply

polardune-dev May 13, 2025
Author

I am looking for the API calls to not only iterate over pages, but also over the page content. For instance, I want to find rectangles on each page, without having to parse the streams myself. If that can't be done with the API, I would be relying on internals of the package which is usually a bad idea.

hhrutter · 2025-05-13T20:09:20Z

hhrutter
May 13, 2025
Maintainer

Page content stream internals are not modelled by pdfcpu as page content is not really processed other than for resource optimization.

2 replies

polardune-dev May 13, 2025
Author

Thanks for the fast response.

CodeMonitor-lab May 15, 2025

The only time pdfcpu touches the page content stream to optimize resources like images, fonts, and color spaces.To remove unused objects or compress streams to reduce file size.

mdmcconnell · 2025-05-19T12:40:51Z

mdmcconnell
May 19, 2025

You might want to look at qpdf, which can create a fairly complete json representation of a pdf. If you are trying to do this programatically within go, you could spawn a process, read its output, and unmarshal it. Not very elegant, I know.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Iterating over all content of all pages? #1160

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Iterating over all content of all pages? #1160

Uh oh!

polardune-dev May 12, 2025

Replies: 3 comments · 3 replies

Uh oh!

Uh oh!

hhrutter May 13, 2025 Maintainer

Uh oh!

polardune-dev May 13, 2025 Author

Uh oh!

hhrutter May 13, 2025 Maintainer

Uh oh!

polardune-dev May 13, 2025 Author

Uh oh!

CodeMonitor-lab May 15, 2025

Uh oh!

mdmcconnell May 19, 2025

polardune-dev
May 12, 2025

Replies: 3 comments 3 replies

hhrutter
May 13, 2025
Maintainer

polardune-dev May 13, 2025
Author

hhrutter
May 13, 2025
Maintainer

polardune-dev May 13, 2025
Author

mdmcconnell
May 19, 2025