-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to parse pdf due to font issue #823
Comments
I am getting the same error as [romanimm] plus null reference exception and Invalid ColorSpace token encountered. Would you please investigate and push a fix? No font descriptor indirect reference found in the TrueType font: <BaseFont, /KVGATS+GNElliot-Bold>, <Encoding, /WinAnsiEncoding>, <FirstChar, 32>, <FontDescriptor, 48 0>, <LastChar, 117>, <Subtype, /Type1>, <ToUnicode, 71 0>, <Type, /Font>, <Widths, [ 240, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 265, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 681, 0, 0, 0, 0, 0, 0, 777, 302, 0, 0, 0, 865, 777, 777, 0, 0, 660, 588, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 551, 0, 0, 601, 551, 383, 601, 601, 259, 0, 0, 0, 0, 601, 601, 0, 0, 401, 463, 415, 601 ]>. Invalid ColorSpace token encountered in page resource dictionary: 655 0. |
@mahmoodali31 can you share the problematic pdf file? |
@BobLd I cannot share the PDF as it is confidential. I am using a PDF stream to extract content. I double-checked the PDF. It contains an image and text at the bottom. StringBuilder sb = new();
} |
I've stumbled upon this pdf, which throws an
InvalidFontFormatException
up on calling document.GetPages() or GetPage(x).I've tried with different
ParsingOptions
:SkipMissingFonts = true
gives a null pointer exceptionUseLenientParsing = true
has no effectLog:
Tested with 0.1.8 and 0.1.9-alpha-20240419-1ef2e on windows an linux (alpine).
The text was updated successfully, but these errors were encountered: