Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Ontology Data Augmentation #717

Merged
merged 69 commits into from
Apr 8, 2022

Conversation

Pushkar-Bhuse
Copy link
Collaborator

Description of changes

The augmentation framework introduced in #621 and #685 introduce a way to perform augmentation on multiple ontologies. This PR works on creating a method to simplify the operations of the BaseDataAugmentationOp by allowing augmentation of only one ontology at a time.

Possible influences of this PR.

This PR will have two uses:

  1. It will provide a DA method with increased simplicity of implementation (while also removing freedom of augmenting multiple ontologies).
  2. It will provide a platform to port all existing DA algorithms to the new Op format.

Test Conducted

All implementations and tests conducted for existing DA methods in Forte were updated to work with the new Op structure and the format required by SingleTokenAugmentationOp

@Pushkar-Bhuse Pushkar-Bhuse added the data_aug Features on data augmentation label Mar 31, 2022
@codecov
Copy link

codecov bot commented Mar 31, 2022

Codecov Report

Merging #717 (98de210) into master (f73ef0b) will decrease coverage by 0.02%.
The diff coverage is 90.99%.

@@            Coverage Diff             @@
##           master     #717      +/-   ##
==========================================
- Coverage   81.36%   81.34%   -0.03%     
==========================================
  Files         242      243       +1     
  Lines       18291    18338      +47     
==========================================
+ Hits        14883    14917      +34     
- Misses       3408     3421      +13     
Impacted Files Coverage Δ
.../algorithms/embedding_similarity_replacement_op.py 96.00% <ø> (ø)
..._augment/algorithms/distribution_replacement_op.py 75.67% <80.00%> (ø)
...orte/processors/data_augment/algorithms/eda_ops.py 87.17% <87.17%> (ø)
...rs/data_augment/algorithms/single_annotation_op.py 88.00% <88.00%> (ø)
...ssors/data_augment/algorithms/word_splitting_op.py 88.63% <88.63%> (ø)
...ors/data_augment/algorithms/back_translation_op.py 83.33% <100.00%> (+1.75%) ⬆️
...ssors/data_augment/algorithms/character_flip_op.py 90.62% <100.00%> (+3.52%) ⬆️
...ta_augment/algorithms/dictionary_replacement_op.py 92.00% <100.00%> (+1.52%) ⬆️
...ors/data_augment/algorithms/typo_replacement_op.py 84.21% <100.00%> (+1.35%) ⬆️
...ment/algorithms/back_translation_augmenter_test.py 95.45% <100.00%> (+0.21%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f73ef0b...98de210. Read the comment docs.

Copy link
Member

@hunterhector hunterhector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can provide better documentation in a way to help developers:

https://asyml-forte--717.org.readthedocs.build/en/717/code/data_aug.html

Check the generated documentation to see if you will find it useful from a developer's angle.

forte/processors/data_augment/algorithms/eda_ops.py Outdated Show resolved Hide resolved
forte/processors/data_augment/algorithms/eda_ops.py Outdated Show resolved Hide resolved
forte/processors/data_augment/algorithms/eda_ops.py Outdated Show resolved Hide resolved
docs/code/data_aug.rst Outdated Show resolved Hide resolved
@hunterhector hunterhector merged commit dff572f into asyml:master Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data_aug Features on data augmentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants