Augment a batch of texts with Contextual Word Embeddings Augmenters #146

AliKarimi74 · 2020-09-12T23:21:18Z

Hi,

Thank you for this excellent library.
Is it possible to augment a batch of texts with contextual word augmenters? I try to use this type of augmenter during the training, and augmenting examples one by one is frustrating. I appreciate any suggestions.

Thanks!

makcedward · 2020-09-13T05:35:05Z

There are 2 possible causes. The first scenario is 1 input and multiple outputs. The second scenario is a list of input and output is the same size as the input.

For the first case, you can use something like this one with augment function
augmenter.augment(text, n=2)

For the second case, you can use augments while input is list

texts = [
    'The quick brown fox jumps over the lazy dog .',
    'It is proved that augmentation is one of the anchor to success of computer vision model.'
]

aug = naw.ContextualWordEmbsAug(
    model_path='bert-base-uncased', action="insert")
augmented_texts = aug.augments(texts)

Just added a sample code. You may take a look

AliKarimi74 · 2020-09-13T08:13:28Z

Thanks for your response.

My scenario is the second one. I took a look at the code, and it seems that the augments method iterates over the list of samples and run the same augment method when GPU is available. So, I think if I want to augment a dataset, the execution time doesn't change significantly. Am I right?
For contextual (and maybe back-translation), is it possible to augment a batch of examples in a single forward pass?

makcedward · 2020-09-17T03:13:28Z

Will enhance to support a single forward pass for multiple inputs per augmentation in order to speed up the process. For multiple input cases, it will definitely speed up the process. From my testing, it speeds up around 3x when augmenting 3 inputs.

However, it still needs to go through augmentation one by one per input. For example, we have input "A B C D E" and we want to augment B and D. To prevent grammatical mistakes, it will augment B and then D. In order words, there is 2 single forward pass for B and D respectively. The process is like that if we augment two words (B and D):
Orignal: A (B) C (D) E
First Augmentation: A (X) C (D) E
Second Augmentation: A (X) C (Y) E

For multiple inputs, the process is like:
Orignal: [{A1 (B1) C1 (D1 E}, {(A2) B2 C2 (D2) E2}]
First Augmentation: [{A1 (X1) C1 (D1) E1}, {(X2), B2 C2 (D2) E2}]
Second Augmentation: [{A1 (X1) C1 (Y1) E1}, {(X2), B2 C2 (Y2) E2}]

…#146

makcedward · 2020-09-25T05:42:23Z

Fixed in 1.0.0 version

rajat-tech-002 · 2021-11-01T08:40:51Z

@makcedward is it also fixed in latest version?

makcedward · 2021-11-21T00:48:40Z

@rajat-tech-002
Yes. It is supported from v1.0.0.

You can

aug = ContextualWordEmbsAug(batch_size=32) # default is 32
aug_texts = aug.augment(texts)

AliKarimi74 changed the title ~~Augment a batch of text with Contextual Word Embeddings Augmenters~~ Augment a batch of texts with Contextual Word Embeddings Augmenters Sep 12, 2020

makcedward added enhancement New feature or request and removed enhancement New feature or request labels Sep 13, 2020

makcedward added the enhancement New feature or request label Sep 17, 2020

makcedward added a commit that referenced this issue Sep 19, 2020

Support single forward data input for deep learning models #146

ba404a6

makcedward added a commit that referenced this issue Sep 19, 2020

Support single forward data input for ContextualWordEmbsForSentenceAug …

92b8030

…#146

makcedward added a commit that referenced this issue Sep 19, 2020

Support single forward data input for AbstSummAug #146

13fab78

makcedward added a commit that referenced this issue Sep 19, 2020

Support single forward data input for BackTranslationAug #146

818ae70

makcedward closed this as completed Sep 25, 2020

kgarg8 mentioned this issue Dec 30, 2021

*** ValueError: expected sequence of length 43 at dim 1 (got 56) when using batch_size with ContextualWordEmbsForSentenceAug #266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Augment a batch of texts with Contextual Word Embeddings Augmenters #146

Augment a batch of texts with Contextual Word Embeddings Augmenters #146

AliKarimi74 commented Sep 12, 2020

makcedward commented Sep 13, 2020 •

edited

Loading

AliKarimi74 commented Sep 13, 2020

makcedward commented Sep 17, 2020 •

edited

Loading

makcedward commented Sep 25, 2020

rajat-tech-002 commented Nov 1, 2021

makcedward commented Nov 21, 2021

Augment a batch of texts with Contextual Word Embeddings Augmenters #146

Augment a batch of texts with Contextual Word Embeddings Augmenters #146

Comments

AliKarimi74 commented Sep 12, 2020

makcedward commented Sep 13, 2020 • edited Loading

AliKarimi74 commented Sep 13, 2020

makcedward commented Sep 17, 2020 • edited Loading

makcedward commented Sep 25, 2020

rajat-tech-002 commented Nov 1, 2021

makcedward commented Nov 21, 2021

makcedward commented Sep 13, 2020 •

edited

Loading

makcedward commented Sep 17, 2020 •

edited

Loading