Ignore hidden text? #446
Unanswered
PmE8HW0KRfqa
asked this question in
Q&A
Replies: 1 comment
-
These may be on two separate content streams. You could try ignoring one. Or, for each letter in page.Letters, check if it overlaps with other letters. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to extract text from a PDF file that is vector but has had OCR added to it (poorly). If I try to copy the text using a GUI tool (like Acrobat), I might select the OCR text, or I might select the original text. It's not predicable since the two text fields are effectively on top of each other. PDFpig picks up everything. Is there a means to only extract text that is visible?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions