Skip to content

Releases: py-pdf/pypdf

Version 1.28.5, 2022-07-21

21 Jul 17:22
1.28.5
c56fd23
Compare
Choose a tag to compare

What's Changed

  • BUG: Add missing deprecated EncodedStreamObject functions by @MasterOdin in #1140

Full Changelog: 1.28.4...1.28.5

2.6.0

17 Jul 19:19
2.6.0
33634d4
Compare
Choose a tag to compare

What's Changed

New Features (ENH)

  • Add color and font_format to PdfReader.outlines[i] (#1104)
  • Extract Text Enhancement (whitespaces) (#1084)

Bug Fixes (BUG)

  • Use build_destination for named destination outlines (#1128)
  • Avoid a crash when a ToUnicode CMap has an empty dstString in beginbfchar (#1118)
  • Prevent deduplication of PageObject (#1105)
  • None-check in DictionaryObject.read_from_stream (#1113)
  • Avoid IndexError in _cmap.parse_to_unicode (#1110)

Documentation (DOC)

  • Explanation for git submodule
  • Watermark and stamp (#1095)

Maintenance (MAINT)

  • Text extraction improvements (#1126)
  • Destination.color returns ArrayObject instead of tuple as fallback (#1119)
  • Use add_bookmark_destination in add_bookmark (#1100)
  • Use add_bookmark_destination in add_bookmark_dict (#1099)

Testing (TST)

  • Add test for arab text (#1127)
  • Add xfail for decryption fail (#1125)
  • Add xfail test for IndexError when extracting text (#1124)
  • Add MCVE showing outline title issue (#1123)

Code Style (STY)

  • Use IntFlag for permissions_flag / update_page_form_field_values (#1094)
  • Simplify code (#1101)

New Contributors

Full Changelog: 2.5.0...2.6.0

Version 2.5.0, 2022-07-10

10 Jul 14:21
2.5.0
8f47939
Compare
Choose a tag to compare

What's Changed

New Features (ENH)

  • Add support for indexed color spaces / BitsPerComponent for decoding PNGs (#1067)
  • Add PageObject._get_fonts (#1083)

Performance Improvements (PI)

  • Use iterative DFS in PdfWriter._sweep_indirect_references (#1072)

Bug Fixes (BUG)

  • Let Page.scale also scale the crop-/trim-/bleed-/artbox (#1066)
  • Column default for CCITTFaxDecode (#1079)

Robustness (ROB)

  • Guard against None-value in _get_outlines (#1060)

Documentation (DOC)

  • Stamps and watermarks (#1082)
  • OCR vs PDF text extraction (#1081)
  • Python Version support
  • Formatting of CHANGELOG

Developer Experience (DEV)

  • Cache downloaded files (#1070)
  • Speed-up for CI (#1069)

Maintenance (MAINT)

Testing (TST)

  • Image extraction (#1080)
  • Image extraction (#1077)

Code Style (STY)

  • Apply black
  • Typo in Changelog

Full Changelog: 2.4.2...2.5.0

2.4.2

05 Jul 12:39
2.4.2
a345690
Compare
Choose a tag to compare

What's Changed

New Features (ENH)

  • Add PdfReader.xfa attribute (#1026)

Bug Fixes (BUG)

  • Wrong page inserted when PdfMerger.merge is done (#1063)
  • Resolve IndirectObject when it refers to a free entry (#1054)

Developer Experience (DEV)

  • Added {posargs} to tox.ini (#1055)

Maintenance (MAINT)

  • Remove PyPDF2._utils.bytes_type (#1053)

Testing (TST)

  • Scale page (indirect rect object) (#1057)
  • Simplify pathlib PdfReader test (#1056)
  • IndexError of VirtualList (#1052)
  • Invalid XML in xmp information (#1051)
  • No pycryptodome (#1050)
  • Increase test coverage (#1045)

Code Style (STY)

  • DOC of compress_content_streams (#1061)
  • Minimize diff for #879 (#1049)

Full Changelog: 2.4.1...2.4.2

Version 2.4.1, 2022-06-30

30 Jun 06:47
2.4.1
66f00fc
Compare
Choose a tag to compare

What's Changed

New Features (ENH)

  • Add writer.pdf_header property (getter and setter) (#1038)

Performance Improvements (PI)

  • Remove b_ call in FloatObject.write_to_stream (#1044)
  • Check duplicate objects in writer._sweep_indirect_references (#207)

Documentation (DOC)

  • How to surppress exceptions/warnings/log messages (#1037)
  • Remove hyphen from lossless (#1041)
  • Compression of content streams (#1040)
  • Fix inconsistent variable names in add-watermark.md (#1039)
  • File size reduction
  • Add CHANGELOG to the rendered docs (#1023)

Maintenance (MAINT)

  • Handle XML error when reading XmpInformation (#1030)
  • Deduplicate Code / add mutmut config (#1022)

Code Style (STY)

  • Use unnecessary one-line function / class attribute (#1043)
  • Docstring formatting (#1033)

New Contributors

Full Changelog: 2.4.0...2.4.1

2.4.0

26 Jun 19:27
2.4.0
53efd73
Compare
Choose a tag to compare

What's Changed

Thanks to @exiledkingcc PyPDF2 now also supports R6 decryption 🎉 Thank you 🤗

New Features (ENH)

  • Support R6 decrypting (#1015)
  • Add PdfReader.pdf_header (#1013)

Performance Improvements (PI)

  • Remove ord_ calls (#1014)

Bug Fixes (BUG)

  • Fix missing page for bookmark (#1016)

Robustness (ROB)

  • Deal with invalid Destinations (#1028)

Documentation (DOC)

  • get_form_text_fields does not extract dropdown data (#1029)
  • Adjust PdfWriter.add_uri docstring
  • Mention crypto extra_requires for installation (#1017)

Developer Experience (DEV)

  • Use /n line endings everywhere (#1027)
  • Adjust string formatting to be able to use mutmut (#1020)
  • Update Bug report template

Full Changelog: 2.3.1...2.4.0

Version 2.3.1, 2022-06-19

19 Jun 12:56
2.3.1
6b9f472
Compare
Choose a tag to compare

What's Changed

Bug Fixes (BUG)

  • Forgot to add the interal _codecs subpackage.

Full Changelog: 2.3.0...2.3.1

2.3.0

19 Jun 10:27
2.3.0
d5bc278
Compare
Choose a tag to compare

What's Changed

The highlight of this release is improved support for file encryption
(AES-128 and AES-256, R5 only). See #749 for the amazing work of
@exiledkingcc 🎊 Thank you 🤗

Deprecations (DEP)

  • Rename names to be PEP8-compliant (#967)
  • PdfWriter.get_page: the pageNumber parameter is renamed to page_number
  • PyPDF2.filters:
    • For all classes, a parameter rename: decodeParms ➔ decode_parms
    • decodeStreamData ➔ decode_stream_data
  • PyPDF2.xmp:
    • XmpInformation.rdfRoot ➔ XmpInformation.rdf_root
    • XmpInformation.xmp_createDate ➔ XmpInformation.xmp_create_date
    • XmpInformation.xmp_creatorTool ➔ XmpInformation.xmp_creator_tool
    • XmpInformation.xmp_metadataDate ➔ XmpInformation.xmp_metadata_date
    • XmpInformation.xmp_modifyDate ➔ XmpInformation.xmp_modify_date
    • XmpInformation.xmpMetadata ➔ XmpInformation.xmp_metadata
    • XmpInformation.xmpmm_documentId ➔ XmpInformation.xmpmm_document_id
    • XmpInformation.xmpmm_instanceId ➔ XmpInformation.xmpmm_instance_id
  • PyPDF2.generic:
    • readHexStringFromStream ➔ read_hex_string_from_stream
    • initializeFromDictionary ➔ initialize_from_dictionary
    • createStringObject ➔ create_string_object
    • TreeObject.hasChildren ➔ TreeObject.has_children
    • TreeObject.emptyTree ➔ TreeObject.empty_tree

New Features (ENH)

  • Add decrypt support for V5 and AES-128, AES-256 (R5 only) (#749)

Robustness (ROB)

  • Fix corrupted (wrongly) linear PDF (#1008)

Maintenance (MAINT)

  • Move PDF_Samples folder into ressources
  • Fix typos (#1007)

Testing (TST)

  • Improve encryption/decryption test (#1009)
  • Add merger test cases with real PDFs (#1006)
  • Add mutmut config

Code Style (STY)

  • Put pure data mappings in separate files (#1005)
  • Make encryption module private, apply pre-commit (#1010)

New Contributors

Full Changelog: 2.2.1...2.3.0

Version 2.2.1, 2022-06-17

17 Jun 11:22
2.2.1
91b3e8a
Compare
Choose a tag to compare

What's Changed

Performance Improvements (PI)

  • Remove b_ calls (#992, #986)
  • Apply improvements to _utils suggested by perflint (#993)

Robustness (ROB)

  • utf-16-be' codec can't decode (...) (#995)

Documentation (DOC)

  • Remove reference to Scripts (#987)

Developer Experience (DEV)

  • Fix type annotations for add_bookmarks (#1000)

Testing (TST)

  • Add test for PdfMerger (#1001)
  • Add tests for XMP information (#996)
  • reader.get_fields / zlib issue / LZW decode issue (#1004)
  • reader.get_fields with report generation (#1002)
  • Improve test coverage by extracting texts (#998)

Code Style (STY)

  • Apply fixes suggested by pylint (#999)

Full Changelog: 2.2.0...2.2.1

Version 2.2.0, 2022-06-13

13 Jun 19:53
2.2.0
f0cd829
Compare
Choose a tag to compare

What's Changed

The 2.2.0 release improves text extraction (#969 - again by @pubpub-zz 🙏):

  • Improvements around /Encoding / /ToUnicode
  • Extraction of CMaps improved
  • Fallback for font def missing
  • Support for /Identity-H and /Identity-V: utf-16-be
  • Support for /GB-EUC-H / /GB-EUC-V / GBp/c-EUC-H / /GBpc-EUC-V (beta release for evaluation)
  • Arabic (for evaluation)
  • Whitespace extraction improvements

Those changes should mainly improve the text extraction for non-ASCII alphabets,
e.g. Russian / Chinese / Japanese / Korean / Arabic.

Full Changelog: 2.1.1...2.2.0