-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROB: Let _build_destination skip in case of missing /D key #2018
Conversation
* Adds an else block to prevent calling _build_destination is not called with a dictionary object when it is expecting a list.
Do you have a PDF / some code that shows the issue you want to fix? |
This is running against:
import sys
from pypdf import PdfReader, PdfWriter
def main():
file = PdfReader(sys.argv[1])
new_pdf = PdfWriter()
new_pdf.append(fileobj=file, pages=(0, 20))
with open("test_copy.pdf", "wb") as fp:
new_pdf.write(fp)
if __name__ == "__main__":
main() $ python test.py test.pdf Traceback (most recent call last):
File "/tmp/pypdf/test.py", line 17, in <module>
main()
File "/tmp/pypdf/test.py", line 10, in main
new_pdf.append(fileobj=file, pages=(0, 20))
File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_writer.py", line 2812, in append
self.merge(
File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_utils.py", line 466, in wrapper
return func(*args, **kwargs)
File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_writer.py", line 2904, in merge
reader.named_destinations
File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 513, in named_destinations
return self._get_named_destinations()
File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 777, in _get_named_destinations
self._get_named_destinations(kid.get_object(), retval)
File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 794, in _get_named_destinations
dest = self._build_destination(key, value) # type: ignore
File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 1001, in _build_destination
page, typ = array[0:2] # type: ignore
File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 320, in __getitem__
return dict.__getitem__(self, key).get_object()
TypeError: unhashable type: 'slice' |
@nickryand can you provide the test.pdf file you are using ? |
I cannot provide it as it's part of a collection I'm working on for someone else. I can provide metadata information about it. |
do you accept to send it by mail to @MartinThoma [email protected] ? it will be easier for analysis |
@nickryand Your PR breaks CI and cannot be merged like this. See You could either remove the
|
Is this request still active? |
I haven't seen any e-mail, but even without that I can see that this kind of structure can be problematic. In general, we likely always have to assume that some key does not exist (even if the specs require it). At some point I hope we can find more general solutions, e.g. defining the specs formally in code and defining the fallback (or exception) we raise if pypdf wants to access a key that does not exist. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #2018 +/- ##
==========================================
+ Coverage 94.07% 94.12% +0.04%
==========================================
Files 33 41 +8
Lines 7077 7391 +314
Branches 1413 1460 +47
==========================================
+ Hits 6658 6957 +299
- Misses 262 269 +7
- Partials 157 165 +8
☔ View full report in Codecov by Sentry. |
@nickryand Thank you for your contribution 🙏 It will be part of the pypdf release on 10.12.2023 (Sunday). |
@nickryand If you want I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-) |
Oh, wow, I just noticed it took me way too long to merge this. I'm sorry for that 🙈 |
## What's new ### Bug Fixes (BUG) - Cope with deflated images with CMYK Black Only (#2322) by @pubpub-zz - Handle indirect objects as parameters for CCITTFaxDecode (#2307) by @stefan6419846 - check words length in _cmap type1_alternative function (#2310) by @Takher ### Robustness (ROB) - Relax flate decoding for too many lookup values (#2331) by @stefan6419846 - Let _build_destination skip in case of missing /D key (#2018) by @nickryand ### Documentation (DOC) - Note in reading form data (#2338) by @MartinThoma - Pull Request prefixes and size by @MartinThoma - Add https:/zuypt for #2325 as a contributor by @MartinThoma - Fix docstring for RunLengthDecode.decode (#2302) by @stefan6419846 ### Maintenance (MAINT) - Enable `disallow_any_generics` and add missing generics (#2278) by @nilehmann ### Testing (TST) - Centralize file downloads (#2324) by @MartinThoma ### Code Style (STY) - Fix typo "steam" \xe2\x86\x92 "stream" (#2327) by @stefan6419846 - Run black by @MartinThoma - Make Traceback in bug report template uppercase (#2304) by @stefan6419846 [Full Changelog](3.17.1...3.17.2)
Calling _build_destination with a dictionary can cause it to error if the code which slices the array is executed. This skips the scenario where DictionaryObject does not have a "/D" key and is currently passed to _build_destination.