Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix559 Change reference/access of individual pack in the MultiPack from (array) index based to pack_id based (to facilitate delete, insert), with version-enabled backward compatibility #576

Merged
merged 43 commits into from
Mar 4, 2022

Conversation

J007X
Copy link
Collaborator

@J007X J007X commented Dec 22, 2021

This PR fixes #559. (as the main issue) , and also fixes #595. of a dictionary key (changed form it to str by the deserializer) issue currently only affecting this change (thus needs to be changed/delivered together)

Description of changes

In order to change from using array index number to pack_id (in order to safe delete pack from multi-pack without affecting the reference to other packs), following places are changed:
"_resolve_pointer" of MultiEntry utilizing latest version of pack to perform backward_compatibility check (and resolve the pointer in the old way if the pack is before/oder than the BACKWARD_COMPATIBLE_VER. in add_member() method as well as in the set_parent and set_child of class MultipackLink, pack_id is used instead of pack_index.
(Also includes recent changes in version for pack in base_pack.py and version.py)

The (dependent) fixing of issue595 that correct the dictionary key type in method "deserialize" (from line 647) and "from_string" (line 687) of multi_pack.py, basically change the keys of dictionary _inverse_pack_ref from str back to int

Possible influences of this PR.

Old version of packs such as those being serialized to disk, when loaded to this new version, need to be version checked and perform different ways to resolve the pointer according (old version using index, newer version using pack_id)

Test Conducted

Some adding/deleting of multipack is tested.
Also deserialize related test case "test_multipack_deserialized_dictionary_recover"
is added for the dictionary key type correction (for #595) in "entry_data_structures_test.py"

@hunterhector
Copy link
Member

hunterhector commented Jan 10, 2022

please provide a description title to the PR

forte/data/ontology/core.py Outdated Show resolved Hide resolved
forte/data/ontology/core.py Outdated Show resolved Hide resolved
tests/forte/data/multi_pack_test.py Show resolved Hide resolved
forte/data/ontology/core.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 13, 2022

Codecov Report

Merging #576 (11b268f) into master (f974a38) will increase coverage by 0.05%.
The diff coverage is 89.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #576      +/-   ##
==========================================
+ Coverage   80.90%   80.95%   +0.05%     
==========================================
  Files         240      240              
  Lines       17362    17430      +68     
==========================================
+ Hits        14047    14111      +64     
- Misses       3315     3319       +4     
Impacted Files Coverage Δ
forte/data/base_pack.py 77.59% <75.00%> (-0.19%) ⬇️
forte/data/multi_pack.py 79.12% <77.77%> (+1.11%) ⬆️
forte/data/ontology/core.py 83.93% <81.81%> (-0.13%) ⬇️
forte/data/ontology/top.py 85.47% <100.00%> (ø)
forte/version.py 100.00% <100.00%> (ø)
tests/forte/data/entry_data_structures_test.py 99.48% <100.00%> (+0.05%) ⬆️
tests/forte/data/multi_pack_test.py 96.90% <100.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f974a38...11b268f. Read the comment docs.

@J007X J007X changed the title Fix559 Fix559 Change reference/access of individual pack in the MultiPack from (array) index based to pack_id based (to facilitate delete, insert), with version-enabled backward compatibility Jan 13, 2022
@J007X
Copy link
Collaborator Author

J007X commented Jan 13, 2022 via email

@J007X
Copy link
Collaborator Author

J007X commented Jan 13, 2022 via email

@hunterhector
Copy link
Member

hunterhector commented Jan 13, 2022

Hi Hector, Also about the current fix in multi_pack.py 282-292: try: return self._inverse_pack_ref[pack_id] except KeyError: # sometimes after deserialization the dict have string as key s_dict: dict = self._inverse_pack_ref s_key = str(pack_id) try: return s_dict[s_key] except KeyError as ke: # really not found raise ProcessExecutionException( f"Pack {pack_id} is not in this multi-pack." ) from ke this is becasue after multi-pack deserialization the _inverse​pack_ref dictionary becomes [str, int] instead of [int, int] . If you send in an int to lookup it will generate keyError. So you might want to have some one to have a look to see why it changes. (this can be debugged from entry_data_structures_test.py line 174) Thanks, James James Xiao Architect T: +971 2 8113163 E: @.*** www.mbzuai.ac.ae<http://www.mbzuai.ac.ae> Mohamed bin Zayed University of Artificial Intelligence Masdar City, Building 1B, 3rd Floor Abu Dhabi, UAE [cid:MBZUAI_Logo_4d283c4f-1a0d-4aed-8a25-b17ff8b431f8.png] The world’s first specialized research-based AI university

________________________________ 发件人: James Xiao @.> 发送时间: 2022年1月13日 18:09 收件人: asyml/forte @.> 主题: 回复: [asyml/forte] Fix559 (PR #576) Thanks Hector, I've just changed to "Fix559 Change reference/access of individual pack in the MultiPack from (array) index based to pack_id based (to facilitate delete, insert), with version-enabled backward compatibility" Also about today's check in: I've made some changes to 5 places however this might also need a MultiPack change related to deserialization (500/547 related) by Bater tomorrow, so the current commit version is not the fully sync'd / final version. Thanks, James
________________________________ 发件人: Hector @.> 发送时间: 2022年1月11日 3:56 收件人: asyml/forte @.> 抄送: James Xiao @.>; Author @.> 主题: Re: [asyml/forte] Fix559 (PR #576) please provide a description title to the PR — Reply to this email directly, view it on GitHub<#576 (comment)>, or unsubscribehttps:/notifications/unsubscribe-auth/AWD56G4NRFHDGYAPIJSJEEDUVNW3JANCNFSM5KR7VO7Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>

Hmm. So I think this is related to the serialization. In fact, JSON requires all keys to be a string. So if we do a Json dump we will get the string. This should be a bug that we overlook.

See here for the discussion: https://stackoverflow.com/questions/1450957/pythons-json-module-converts-int-dictionary-keys-to-strings

Btw, would you mind creating an issue about this finding?

@J007X
Copy link
Collaborator Author

J007X commented Jan 13, 2022 via email

J007X and others added 6 commits February 2, 2022 16:23
forte/data/multi_pack.py Outdated Show resolved Hide resolved
forte/data/ontology/core.py Outdated Show resolved Hide resolved
forte/data/ontology/core.py Outdated Show resolved Hide resolved
@hunterhector
Copy link
Member

Please understand the usage of the fixes keyword of github: https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue

forte/data/multi_pack.py Show resolved Hide resolved
forte/version.py Outdated Show resolved Hide resolved
forte/data/multi_pack.py Outdated Show resolved Hide resolved
forte/data/ontology/core.py Outdated Show resolved Hide resolved
tests/forte/data/multi_pack_test.py Show resolved Hide resolved
@hunterhector hunterhector merged commit ac5b58f into asyml:master Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment