-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page Label Access Fails when PDF PageLabels Number Tree Does Not Contain "/S" #1560
Comments
Fixes #1560 Co-authored-by: jonahmajumder <[email protected]>
@jonahmajumder Thank you for the detailed error description and for providing an example 🙏 What do you think about #1562 ? (Please leave a comment there; then we can continue discussing details about how to fix it there) I've added you as a co-author as you have pretty much already solved the issue - good work 👍 :-) |
As an side-note: I love Python 3.11 for its new tracebacks
|
Fixes #1560 Co-authored-by: jonahmajumder <[email protected]>
Congratulations @jonahmajumder - your change is now in If you want, I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-) |
Sure, sounds good! I use the package a lot, so I will continue to bring up
and issues/features I find.
…On Wed, Jan 18, 2023 at 3:04 PM Martin Thoma ***@***.***> wrote:
Congratulations @jonahmajumder <https:/jonahmajumder> - your
change is now in main and will be on PyPI on Sunday :-)
If you want, I can add you to
https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-)
—
Reply to this email directly, view it on GitHub
<#1560 (comment)>, or
unsubscribe
<https:/notifications/unsubscribe-auth/AC6LF7XPSPHK7LXT66IILN3WTBEFJANCNFSM6AAAAAAT7GBI2E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
d942a49 - added :-) |
Explanation
The PDF spec supports page labeling (distinct from page indexing), and I am happy to see that this has been incorporated into the PdfReader class via the
page_labels
property.However, the current implementation throws an error in the edge case where
"/S"
is not defined (and so no representation of the current page index should be used). Also, the current implementation is incomplete in that it does not incorporate the"/P"
and"/St"
keys in page label dictionaries.Environment
$ python -m platform macOS-10.16-x86_64-i386-64bit $ python --version Python 3.8.5 $ python -c "import pypdf;print(pypdf.__version__)" 3.2.1
Code + PDF
This is a minimal, complete example that shows the issue:
PDF file:
Numerical Mathematics.pdf
Traceback
This is the complete Traceback I see:
Proposed Solution
The two modifications I would propose are:
In
pypdf._page_labels
, on line 164, the dictionaries corresponding to page indices in thePageLabels
number tree should be parsed in a way that incorporates PDF spec defaults:S = value.get("/S")
,P = value.get("/P", "")
, andSt = value.get("/St", 1)
and then used for labeling in a way that uses these regardless of whether their keys were present:return P + m[S](index - start_index + St)
In
pypdf._page_labels
, in the dictionarym
(defined starting on line 153), an entry should be added corresponding to the case where the "/S" key is not included, i.e.:None: lambda n: ''
so that when "/S" is not included, the page index is simply ignored.It would also be great to see page labeling incorporated into the PdfWriter class to support writing PDFs with custom page labeling, but I realize that is more work and probably lower priority than fixing the PdfReader page labeling functionality.
The text was updated successfully, but these errors were encountered: