Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Make reader.get_fields also return dropdowns with options #1114

Merged
merged 10 commits into from
Jul 21, 2022

Conversation

KourFrost
Copy link
Contributor

@KourFrost KourFrost commented Jul 15, 2022

Adding /Opt to the attributes list get_fields() will now add /Opt if it has it. For Combo Box /Ch it will list all of the available options in the drop down menu.

Closes #391

PyPDF2/constants.py Outdated Show resolved Hide resolved
@MartinThoma
Copy link
Member

I think I need to check where / how we use that dictionary to understand why you've added it. Could you point me to the place in code where adding the constant improves PyPDF2?

Co-authored-by: Martin Thoma <[email protected]>
@KourFrost
Copy link
Contributor Author

KourFrost commented Jul 15, 2022

https://pypdf2.readthedocs.io/en/latest/_modules/PyPDF2/_reader.html#PdfReader.get_fields

reader.get_fields() uses that constants list to add information to the dictionary and only returns attributes from that list.
field_attributes = FieldDictionaryAttributes.attributes_dict()

By adding /Opt to that list its now included, and with my limited testing only shows up if the field has Opt attribute. It didn't break any of your other tests in pytest, so I figured that would be a good solution.

Another solution could be adding like an overloaded function to get_fields that takes an additional list of attributes, Then it wouldn't need to update the constant attributes list.

If one would want to see the items in a drop down/combo box you would need something like this:

for annot in reader.pages[0]["/Annots"]:
    obj = annot.get_object()
    if "/T" in obj.keys():
        if obj["/T"] == objectTitle:
            if "/Opt" in obj.keys():
                print(obj["/Opt"])

@MartinThoma
Copy link
Member

Thank you for explaining it 🤗

Would you mind doing the following?

  1. constants.py: Add a attributes and a attributes_dict method to CheckboxRadioButtonAttributes similar to FieldDictionaryAttributes. Yes, it would be simpler to add this single attribute directly to FieldDictionaryAttributes, but that would mean we mix two different things. The constants.py should be very close to various tables of the PDF specs.
  2. constants.py: In the very bottom, there is a PDF_KEYS. Please add the CheckboxRadioButtonAttributes there. It ensures in the tests that there are no obvious typos.
  3. _reader.py: Please change field_attributes = FieldDictionaryAttributes.attributes_dict() to field_attributes = FieldDictionaryAttributes.attributes_dict();field_attributes.update(CheckboxRadioButtonAttributes.attributes_dict())
  4. _reader.py: Please change for attr in FieldDictionaryAttributes.attributes(): in a similar way so that we iterate over both
  5. Apply black . to format the code. You might need to pip install black first.

I think that should do the trick while maintaining that constants.py represents various tables from the PDF specification. What do you think?

@KourFrost
Copy link
Contributor Author

I like that as a solution. I'll get that implemented.

@codecov
Copy link

codecov bot commented Jul 19, 2022

Codecov Report

Merging #1114 (6eeba04) into main (e1f9772) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1114      +/-   ##
==========================================
+ Coverage   91.94%   91.96%   +0.02%     
==========================================
  Files          24       24              
  Lines        4681     4694      +13     
  Branches      967      968       +1     
==========================================
+ Hits         4304     4317      +13     
  Misses        231      231              
  Partials      146      146              
Impacted Files Coverage Δ
PyPDF2/_reader.py 90.92% <100.00%> (+0.03%) ⬆️
PyPDF2/constants.py 100.00% <100.00%> (ø)
PyPDF2/generic.py 91.74% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1f9772...6eeba04. Read the comment docs.

@KourFrost
Copy link
Contributor Author

It didn't turn out to be as simple as we had hoped. Rewriting get_fields also required a similar change in _write_field and the field object it self. There might be some refactoring that we could do as get_fields loops through the same attributes as _write_field. _write_field is only called in _build_field. _build_field is only called by get_fields.

Also an issue with fields as a definition. Increasing the scope of what attributes define a field ended being the final trick. Fields objects define themselves off of the FieldDictionaryAttributes constants.

@MartinThoma
Copy link
Member

Hey, just a short heads up: it's currently super hot in Germany. I have about 35°C in my apartment during the day and not below 28°C at night. This makes me pretty unconcentrated, meaning I can't look at as many things as I wished.

TL;DR : the review might take a few days.

@KourFrost
Copy link
Contributor Author

its all good, stay safe man!

Copy link
Member

@MartinThoma MartinThoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution 🤗

PyPDF2/_reader.py Outdated Show resolved Hide resolved
@MartinThoma MartinThoma changed the title Forms dropdown options BUG: Make reader.get_fields also return dropdown options Jul 21, 2022
@MartinThoma MartinThoma changed the title BUG: Make reader.get_fields also return dropdown options BUG: Make reader.get_fields also return dropdowns with options Jul 21, 2022
@MartinThoma MartinThoma merged commit 7cba98a into py-pdf:main Jul 21, 2022
@MartinThoma
Copy link
Member

Very nice work @KourFrost ! I will make a release of PyPDF2 to PyPI today that contains your contribution!

If you want, I can also add you to the https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html list - Just send me the line I should add for you :-)

MartinThoma added a commit that referenced this pull request Jul 21, 2022
New Features (ENH):
-  Add `outline_count` property (#1129)

Bug Fixes (BUG):
-  Make reader.get_fields also return dropdowns with options (#1114)
-  Add deprecated EncodedStreamObject functions back until PyPDF2==3.0.0 (#1139)

Robustness (ROB):
-  Cope with missing /W entry (#1136)
-  Cope with invalid parent xref (#1133)

Documentation (DOC):
-  Contributors file (#1132)
-  Fix type in signature of PdfWriter.add_uri (#1131)

Developer Experience (DEV):
-  Add .git-blame-ignore-revs (#1141)

Code Style (STY):
-  Fixing typos (#1137)
-  Re-use code via get_outlines_property in tests (#1130)

Full Changelog: 2.6.0...2.7.0
@KourFrost
Copy link
Contributor Author

Thank you! This is my first real contribution to an open source project, so I'm happy to help.

sure add me to the contributors list. just "KourFrost" linked to my github
https:/KourFrost

@KourFrost KourFrost deleted the forms_dropdown_options branch July 21, 2022 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PdfReader.get_form_text_fields() not returning dropdown fields
2 participants