-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malicious PDF documents test suite #1147
Comments
Thanks a lot, Johan! We were certainly fixing already a number of issues to prevent runtime exceptions on similar malicious PDFs. So, it is indeed an excellent stability test for veraPDF |
In addition to this, one of the Apache Tika developers pointed me to their "stressful PDF corpus", which I think would be useful for stability testing as well. See this post for a description: https://www.pdfa.org/a-new-stressful-pdf-corpus/ Here's the link to the corpus: https://corpora.tika.apache.org/base/docs/bug_trackers Packaged downloads here: |
Hi, there is also the corpus of PDF from the pdfium project: https://pdfium.googlesource.com/pdfium/+/refs/heads/master/testing/resources I tried PDF-UA validation on this corpus and verapdf crashed on some files without being able to provide a valid xml output. I get the following xml output:
Edit: just seen that the corpus from pdfium is included in the one of Apache Tika. |
Thanks a lot for bringing our attention to this corpus. We are working right now on stabilization of veraPDF on various collections of malformed documents. So, this one would certainly be covered as well by the next official release. |
@AlainVagner in fact, checking this particular test file bug_113.pdf, I see that veraPDF does correctly catch the error and generates the XML report. The issue is that in the simple command line, as you use, both stdout and stderr are mixed up. If you redirect stderr to a different file, the remaining XML looks well-formed. You can use A related issue on mixed stdout and stderr is already reported here: #1155 |
@bdoubrov thanks for checking! I tried on my side and I still have the issue when redirecting the stderr to /dev/null. I am on MacOS, and using the version veraPDF 1.19.53. I should probably test on the latest build. |
Hi @AlainVagner Sorry, I might have missed your latest comment. Would you please clarify what exactly doesn't work for you on Mac with the command line: Is generated XML report still not well-formed? Could you send the terminal output together the the XML report in this case? |
may i say thank you for the malicious PDF document set in this issue, it is a wonderful and terrifying format and malicious PDFs are a true, underappreciated net art format |
All known issues in these files are covered. Further performance improvements are done to handle very large files in these collections. |
Earlier this week some researchers of Ruhr University Bochum published a conference paper on insecure features in PDF, based on a systematic review of the full format spec:
https://www.ndss-symposium.org/wp-content/uploads/ndss2021_1B-2_23109_paper.pdf
There's a good summary in this blog post:
https://web-in-security.blogspot.com/2021/01/insecure-features-in-pdfs.html
They've also released a suite of malicious test files, which includes the helper scripts they used to generate these:
https://pdf-insecurity.org/download/pdf-dangerous-paths/exploits-and-helper-scripts.zip
As some of those files might be of interest for VeraPDF testing (if only to make sure that VeraPDF doesn't get caught up in some infinite loop), I'm just dropping the link here.
The text was updated successfully, but these errors were encountered: