Amount of augmentation should be sampled #228

cwenner · 2021-07-04T18:31:27Z

I think most would expect aug_word_p and aug_p to be independent samples for words and their characters. Instead, these parameters specify the fraction of words and characters to augment, rounded down. This seems to lead to some odd behavior, such as not being able to both ensure that short and long texts are similarly distorted.

Concrete example: we may want to simulate realistic spelling mistakes. For that, we probably want aug_word_p*aug_p to at most be a few %. To get that, we have to set aug_char_min=0 or aug_word_min=0 or not use the flow helpers. However, doing either of the former means that short sentences or words will never be augmented, as they are always rounded to 0.

What do you think about changing these values to be independent samples (while respecting the minimums)?

The text was updated successfully, but these errors were encountered:

makcedward · 2021-11-24T00:19:46Z

I guess you may refer to KeywordAug or RandomCharAug.

Both augmenters provide aug_char_p and aug_word_p and they work independently. aug_word_p controls how many words will be drawn from a sentence. After that aug_char_p controls how many characters will be drawn from a word.
Example---------
Input: "I eat apple."
Paramter: aug_word_p = 0.3, aug_char_p = 0.5
One of the words will be drawn. Let assume "apple" is picked.
Within "apple", 2 characters (0.5 * 5 = 2.5 and then round down to 2), it can become "appkW".

If you use Sometimes pipeline (one of the Flow class), aug_p refers percentage of executing sub-pipline.

Example---------
Input: [KeywordAug, RandomCharAug, RandomWordAug]

naf.Sometimes(
    [KeywordAug, RandomCharAug, RandomWordAug, RandomWordAug]
)

if aug_p is 0.3 in Sometimes, it means that only 1 (0.3*4 = 1.2 and then round down to 1) pipeline will be executed. The selected pipeline is different among different execution.

Agree that round down may not be a good approach. Will change it to round up.

cwenner changed the title ~~Amount of augmentation should be random~~ Amount of augmentation should be sampled Jul 4, 2021

makcedward added the enhancement New feature or request label Jul 15, 2021

makcedward added a commit that referenced this issue Nov 24, 2021

[#228] Change aug random behavior from round down to round up

7113b00

makcedward closed this as completed Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amount of augmentation should be sampled #228

Amount of augmentation should be sampled #228

cwenner commented Jul 4, 2021 •

edited

Loading

makcedward commented Nov 24, 2021

Amount of augmentation should be sampled #228

Amount of augmentation should be sampled #228

Comments

cwenner commented Jul 4, 2021 • edited Loading

makcedward commented Nov 24, 2021

cwenner commented Jul 4, 2021 •

edited

Loading