Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Augment a batch of texts with Contextual Word Embeddings Augmenters #146

Closed
AliKarimi74 opened this issue Sep 12, 2020 · 6 comments
Closed
Labels
enhancement New feature or request

Comments

@AliKarimi74
Copy link

Hi,

Thank you for this excellent library.
Is it possible to augment a batch of texts with contextual word augmenters? I try to use this type of augmenter during the training, and augmenting examples one by one is frustrating. I appreciate any suggestions.

Thanks!

@AliKarimi74 AliKarimi74 changed the title Augment a batch of text with Contextual Word Embeddings Augmenters Augment a batch of texts with Contextual Word Embeddings Augmenters Sep 12, 2020
@makcedward
Copy link
Owner

makcedward commented Sep 13, 2020

There are 2 possible causes. The first scenario is 1 input and multiple outputs. The second scenario is a list of input and output is the same size as the input.

For the first case, you can use something like this one with augment function
augmenter.augment(text, n=2)

For the second case, you can use augments while input is list

texts = [
    'The quick brown fox jumps over the lazy dog .',
    'It is proved that augmentation is one of the anchor to success of computer vision model.'
]

aug = naw.ContextualWordEmbsAug(
    model_path='bert-base-uncased', action="insert")
augmented_texts = aug.augments(texts)

Just added a sample code. You may take a look

@makcedward makcedward added enhancement New feature or request and removed enhancement New feature or request labels Sep 13, 2020
@AliKarimi74
Copy link
Author

Thanks for your response.

My scenario is the second one. I took a look at the code, and it seems that the augments method iterates over the list of samples and run the same augment method when GPU is available. So, I think if I want to augment a dataset, the execution time doesn't change significantly. Am I right?
For contextual (and maybe back-translation), is it possible to augment a batch of examples in a single forward pass?

@makcedward makcedward added the enhancement New feature or request label Sep 17, 2020
@makcedward
Copy link
Owner

makcedward commented Sep 17, 2020

Will enhance to support a single forward pass for multiple inputs per augmentation in order to speed up the process. For multiple input cases, it will definitely speed up the process. From my testing, it speeds up around 3x when augmenting 3 inputs.

However, it still needs to go through augmentation one by one per input. For example, we have input "A B C D E" and we want to augment B and D. To prevent grammatical mistakes, it will augment B and then D. In order words, there is 2 single forward pass for B and D respectively. The process is like that if we augment two words (B and D):
Orignal: A (B) C (D) E
First Augmentation: A (X) C (D) E
Second Augmentation: A (X) C (Y) E

For multiple inputs, the process is like:
Orignal: [{A1 (B1) C1 (D1 E}, {(A2) B2 C2 (D2) E2}]
First Augmentation: [{A1 (X1) C1 (D1) E1}, {(X2), B2 C2 (D2) E2}]
Second Augmentation: [{A1 (X1) C1 (Y1) E1}, {(X2), B2 C2 (Y2) E2}]

@makcedward
Copy link
Owner

Fixed in 1.0.0 version

@rajat-tech-002
Copy link

@makcedward is it also fixed in latest version?

@makcedward
Copy link
Owner

@rajat-tech-002
Yes. It is supported from v1.0.0.

You can

aug = ContextualWordEmbsAug(batch_size=32) # default is 32
aug_texts = aug.augment(texts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants