Validating 175K+ PDFs, 1762 files failed the relaxed mode, which errors are critical? #1236
Replies: 3 comments 2 replies
-
|
Hello! Unfortunately there are a lot of PDFWriters out there that produced or are still producing PDFs that are not spec compliant. Having said that, please go get the latest commit and then rerun your parsing tests. I suggest you do smth along the lines of : I am happy to add any problematic files you can share to my local testing corpus and step by step provide fixes. Please consider becoming a pdfcpu sponsor 💚 |
Beta Was this translation helpful? Give feedback.
-
|
HI Susan, Just curious, did you get any hanging validations or stack overflows and such? As far as membership options, please get in touch hhrutter@gmail.com |
Beta Was this translation helpful? Give feedback.
-
|
The low hanging fruits of these are fixed in the latest release 👉🏻 https://github.com/pdfcpu/pdfcpu/releases/tag/v0.11.1 The remainder needs to be addressed individually and that means I'd need the file. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi-
I'm new to PDF validation. I have been running a variety of tools on a pile of 175K+ PDFs (from our self-deposit documents repository). There were 1762 files that failed the relaxed validation. Is there a key or guide to identifying the most serious issues
valErrorsRelaxed.txt? I have also run
validate -vand-vvagainst this subset of 1762 files.Our plan is to modify our repository to check for issues when these files are deposited.
Any assistance would be appreciated.
I am happy to share any PDF files as well as their validation results.
Thanks,
susan
PDFPCU info:
pdfcpu: v0.11.0 dev
commit: Homebrew (2025-05-28T13:23:49Z)
base : go1.24.3
A sample of the results:
validation error (obj#:968): postScriptCalculatorFunctionStreamDict: unsupported in version 1.2 validation error (obj#:1): pdfcpu: validateIndRefArrayEntry: invalid type at index 0 validation error (obj#:90): pdfcpu: validateOutlineTree: empty outline item dict "Count" must be 0 validation error (obj#:9): dict=extGStateDict entry=HT (obj#9): unsupported in version 1.1 validation error (obj#:58): pdfcpu: validateIndRefArrayEntry: invalid type at index 0 validation error (obj#:21): dict=pagesDict entry=Tabs: unsupported in version 1.2 validation error (obj#:746): dict=fileSpecDict entry=Thumb: unsupported in version 1.6 validation error (obj#:452): dict=outlineItemDict required entry=Parent missing validation error (obj#:394): pdfcpu: validateObjectReferenceDict: missing obj#398 validation error (obj#:3): pdfcpu: IsObjValid: no entry for obj#-1 validation error (obj#:4370): dict=optContentConfigDict entry=Locked: unsupported in version 1.5 validation error (obj#:365): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:21): pdfcpu: validateStringArrayEntry: invalid type at index 0 validation error (obj#:48): pdfcpu: validateNameEntry: dict=rootDict entry=PageLayout invalid dict entry: UseNone validation error (obj#:6): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:117): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:411): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:53): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:93): pdfcpu: validateFontFile3SubType: CIDFontType0: unexpected Subtype Type1C validation error (obj#:171): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:243): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:5): pdfcpu: dereferenceDict: wrong type types.Array <[Indexed DeviceRGB 255 (65 0 R)]> validation error (obj#:8): dict=extGStateDict entry=HT (obj#8): unsupported in version 1.1 validation error (obj#:154): pdfcpu: validateFontFile3SubType: CIDFontType0: unexpected Subtype Type1C validation error (obj#:347): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:334): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:38): pdfcpu: validateNameEntry: dict=rootDict entry=PageLayout invalid dict entry: UseNone validation error (obj#:105): pdfcpu: validateFontFile3SubType: CIDFontType0: unexpected Subtype Type1C validation error (obj#:564): "V" not allowed in non terminal text fields with more than one kid validation error (obj#:1010): pdfcpu: DereferenceStreamDict: wrong type <<nil>> <nil> validation error (obj#:23): dict=fileSpecDict entry=Thumb: unsupported in version 1.6 validation error (obj#:815): dict=formFieldDict required entry=FT missing validation error (obj#:359): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:45): dict=outlineItemDict entry=F: unsupported in version 1.3 validation error (obj#:261): dict=outlineItemDict required entry=Parent missingBeta Was this translation helpful? Give feedback.
All reactions