Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: PDF size increases because of too high float writing precision #2213

Merged
merged 1 commit into from
Sep 24, 2023

Conversation

pubpub-zz
Copy link
Collaborator

closes #1910
address regression from #2203

@pubpub-zz pubpub-zz marked this pull request as ready for review September 24, 2023 16:21
@codecov
Copy link

codecov bot commented Sep 24, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (91b6dcd) 94.38% compared to head (d076f76) 94.38%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2213   +/-   ##
=======================================
  Coverage   94.38%   94.38%           
=======================================
  Files          43       43           
  Lines        7588     7589    +1     
  Branches     1497     1497           
=======================================
+ Hits         7162     7163    +1     
  Misses        262      262           
  Partials      164      164           
Files Changed Coverage Δ
pypdf/generic/_base.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma
Copy link
Member

I just made a little test:

import pikepdf
from os import stat
import pypdf.generic._base

INPUT_PDF = "pypdf/sample-files/009-pdflatex-geotopo/GeoTopo.pdf"
OUTPUT_PDF = "output_example.pdf"

pypdf.generic._base.FLOAT_WRITE_PRECISION = 1


def test_filesize(INPUT_PDF, OUTPUT_PDF):
    reader = pypdf.PdfReader(INPUT_PDF)

    writer = pypdf.PdfWriter(clone_from=reader)

    for page in writer.pages:
        page.compress_content_streams(level=9)

    with open(OUTPUT_PDF, 'wb') as f:
        writer.write(f)

    orig_size = stat(INPUT_PDF).st_size / (1024)
    pypdf_size = stat(OUTPUT_PDF).st_size / (1024)

    pikepdf.settings.set_flate_compression_level(9)

    with pikepdf.Pdf.open(INPUT_PDF,
                          allow_overwriting_input=True,
                          suppress_warnings=True) as pdf:

        pdf.save(OUTPUT_PDF,
                 object_stream_mode=pikepdf.
                 ObjectStreamMode.generate,
                 compress_streams=True,
                 stream_decode_level=pikepdf.
                 StreamDecodeLevel.specialized)

    pikepdf_size = stat(OUTPUT_PDF).st_size / (1024)

    print(f'{"input file:":<15}{INPUT_PDF}')
    print(f'{"original size:":<15}{orig_size:.4f} KB')
    print(f'{"pypdf size:":<15}{pypdf_size:.4f} KB')
    print(f'{"pikepdf size:":<15}{pikepdf_size:.4f} KB')

test_filesize(INPUT_PDF, OUTPUT_PDF)

The output is:

input file:    pypdf/sample-files/009-pdflatex-geotopo/GeoTopo.pdf
original size: 5196.4180 KB
pypdf size:    5574.2656 KB
pikepdf size:  5185.7314 KB

For comparison:

* pypdf before this PR                       : 5702.3506 KB
* pypdf with a presision of 8 (as in this PR): 5594.0645 KB
* Smallpdf                                   : 1793.479 KB

Most interesting is that I wasn't able to spot a difference.

@pubpub-zz
Copy link
Collaborator Author

From my analysis, you should not put a value lower than 5 : If you take some colorspace which are using some FloatObject, you may get some too big rounding that will affect the colors and which may not be visible on most of the pdf viewers.

@MartinThoma MartinThoma changed the title BUG : pdf size increases because of float writing precision BUG: PDF size increases because of too high float writing precision Sep 24, 2023
@MartinThoma MartinThoma merged commit e3f60c1 into py-pdf:main Sep 24, 2023
14 checks passed
@@ -379,6 +379,9 @@ def readFromStream(
return IndirectObject.read_from_stream(stream, pdf)


FLOAT_WRITE_PRECISION = 8 # shall be min 5 digits max, allow user adj
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some part of the comment seems to have got lost.

MartinThoma added a commit that referenced this pull request Sep 24, 2023
## What's new

### Bug Fixes (BUG)
-  PDF size increases because of too high float writing precision (#2213) by @pubpub-zz
-  Fix test_watermarking_reportlab_rendering() (#2203) by @LucasCimon

### Documentation (DOC)
-  Fix typos and add a paragraph to ViewerPreferences docs (#2199) by @marcstober
-  How to install pypi from any branch (#2209) by @pubpub-zz
-  Update copyright footer in docs (#2207) by @marcstober

### Developer Experience (DEV)
-  Let dependabot update Github Actions by @MartinThoma

### Maintenance (MAINT)
-  Update .pre-commit-config.yaml by @MartinThoma

[Full Changelog](3.16.1...3.16.2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Compression
3 participants