Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF-Hul: Bug in skipIISBytes and PdfModule.getObject #151

Closed
wants to merge 8 commits into from

Conversation

pmay
Copy link

@pmay pmay commented Oct 11, 2016

Many documents returning Invalid Page Dictionary Object seem to be the result of a bug in PdfFlateInputStream.skipIISBytes which is miscalculating the number of bytes to skip when the requested skip number is larger than the remaining buffer size.

In particular, this seems to relate to Page Trees encoded in stream objects where the root page starts beyond one buffer's worth of data.

The added InvalidPageDictionary.pdf file exemplifies this problem.

Note solving this problem results in "Improperly Constructed Page Tree" being returned by JHOVE. This seems to be being caused by JHOVE not correctly setting the object index for objects extracted from streams, meaning that when PageTreeNode.nextPageObject (line 197) tries to check if it's already visited a node, it fails (essentially it compares index -1 to index -1).

@codecov-io
Copy link

codecov-io commented Oct 11, 2016

Current coverage is 3.43% (diff: 0.00%)

No coverage report found for integration at 144a26e.

Powered by Codecov. Last update 144a26e...0e3715d

…d from an object stream. This resulted in Improperly constructed page tree errors
@pmay pmay changed the title PDF-Hul: Invalid Page Dictionary Object - bug in skipIISBytes PDF-Hul: Bug in skipIISBytes and PdfModule.getObject Oct 11, 2016
Copy link
Member

@david-russo david-russo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

return ostrm.getObject (objIndex);
/* Need to ensure the object number is set */
PdfObject obj = ostrm.getObject (objIndex);
obj.setObjNumber (objIndex);
Copy link
Member

@david-russo david-russo Nov 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the object number can be safely moved into the ObjectStream.getObject() method for the benefit of any other callers.

@carlwilson
Copy link
Member

This is fixed by #188

@carlwilson carlwilson closed this Mar 20, 2017
rgfeldman added a commit to rgfeldman/jhove that referenced this pull request Apr 10, 2019
rgfeldman added a commit to rgfeldman/jhove that referenced this pull request Apr 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants