-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default use of PDF/A-1B profile for regular (non-PDF/A) PDFs #1040
Comments
Thanks for reporting this issue. The only question is that currently the user may run veraPDF on the large collection of PDFs to understand their archival quality even if the majority does not contain a PDF/A ID. If we implement your suggestion as is, the above scenario would not be possible. And if the user does specify the profile explicitly (eg., PDF/A1-b), he still risks that he would be in the situation that this profile differs from what the document PDF/A identification. So, to me this looks like another option is needed: the profile to validate non-PDF/A files (or none and report an error). If this sounds reasonable, we can then include it in the next official release of veraPDF. |
Yes, you're right. Adding an option that defines the profile for non-PDF/A looks like a good solution. In addition it would be helpful to explain the behaviour for non-PDF/A files (including the default profile that is used for these cases) in the documentation, as I couldn't find any info on this (or perhaps I looked in the wrong place). |
Fix released in v1.20 |
I am a bit unsure on how to configure the parser to not assume any default flavour if no is detected in the PDF. I have tried creating a parser with PDFAParser parser = veraPDFFoundry.createParser(
inputStream, PDFAFlavour.NO_FLAVOUR, PDFAFlavour.NO_FLAVOUR); But I still get "1b" from If can however create the parser and set I got the impression that you, @bitsgalore , had the same expectation that I have, that a non-PDF/A should not be "detected" arbitrarily as "PDF/A-1b", but instead an indication that this is not, nor claiming to be, a PDF/A at all. Did you manage to configure the parser in this way, to get I hope you don't mind if I mention you as well @bdoubrov . Is this perhaps documented somewhere, and I have just not been able to find it? :) I use Verapdf Core v1.24.1. Thank you! |
@runeflobakk thanks for raising this question again. I'm not sure though what is the expected behavior in your case: would you like to skip validation of PDF documents that don't identify themselves as PDF/A? veraPDF is not able to do this as of now, at least not directly. The workaround would be to run first feature extraction to get document metadata, check the presence PDF/A identification package via some XPath tool and skip validation, if it is not available. |
@bdoubrov Thanks for your very swift answer :) Yes, I was hoping to skip validation, if the parser does not detect any PDF/A-flavour. boolean isValidPdfA(InputStream documentContent) {
try (VeraPDFFoundry pdfFoundry = Foundries.defaultInstance()) {
try (PDFAParser parser = pdfFoundry.createParser(documentContent, PDFAFlavour.NO_FLAVOUR, (PDFAFlavour)null)) {
PDFAFlavour detectedFlavour = parser.getFlavour();
if (detectedFlavour == null) {
return false;
}
return pdfFoundry
.createValidator(detectedFlavour, false)
.validate(parser)
.isCompliant(); (Omitted various exception handling for brevity) I intended the method to return Previously I used the "autodetect"-variant of |
I see the point. We'll extend the interface of veraPDF parser to report all conformance declarations present in the document. There might be none, or more than one (eg. PDF/A, PDF/UA and now also WFPDF). For the moment the logic of choosing the flavor used for validation is mostly hardcoded. This new interface will allow implementing custom logic, as in your code example. |
@runeflobakk we have extended the veraPDF API (not released yet, available in the latest dev builds) to retrieve all conformance claims in the document metadata and programmatically select 0 or more of them for validation: #1414 So, a sample code might look like:
|
@bdoubrov Thank you! I will check it out when I get some time. In the code above, first conformance claims are collected in In any case, my application needs to handle if the PDF does not claim any PDF/A-conformance, if I am trying to parse a PDF which may or may not be PDF/A, or even not claiming to be. But I am mostly curious to how the validator will behave if configured to validate no flavours of PDF/A, which may happen in your example code. I can of course test this myself, but wanted to mention this as it was the first thing I noticed. Is it even sensible to create a PDFAValidator set up to validate no flavours of PDF/A? |
@runeflobakk It's possible to create
|
@MaximPlusov Thank you very much! 👍 |
I just ran VeraPDF on a "regular" (i.e. non-PDF/A) PDF with automatic flavour detection:
(File can be found here: text_only_fontsEmbeddedAll.pdf)
From the
profileName
attribute in thevalidationReport
I see this activates the PDF/A-1B profile. It seems this is used as a fallback value if no flavour can be detected.This seems pretty arbitrary, and it might also result in false positives if one uses VeraPDF to identify the PDF/A flavour (if any) of a document. See also the Twitter discussion here.
A better approach might be to report
profileName
asunknown
(or something similar), and skip the validation altogether for these cases. The-f
option would still enable a user to validate such files against any of the available profiles.The text was updated successfully, but these errors were encountered: