Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contextual Word Embeddings Augmenter (BERT) error #161

Closed
rajae-Bens opened this issue Oct 7, 2020 · 7 comments
Closed

Contextual Word Embeddings Augmenter (BERT) error #161

rajae-Bens opened this issue Oct 7, 2020 · 7 comments

Comments

@rajae-Bens
Copy link

Hi,

I am getting this error
NameError: name 'AutoTokenizer' is not defined

when trying this code

Augment French by BERT

aug = naw.ContextualWordEmbsAug(model_path='bert-base-multilingual-uncased', aug_p=0.1)
text = "Bonjour, J'aimerais une attestation de l'employeur certifiant que je suis en CDI."
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

I installed transformers and imported AutoTokenizer
but I am still getting the same error
Any ideas plz
thank u

@makcedward
Copy link
Owner

May you share a version of python, transformer and nlpaug?

@rajae-Bens
Copy link
Author

Hi,

Python 3.6.9
nlpaug-1.0.1
transformers-3.3.1

@makcedward
Copy link
Owner

how about PyTorch version? Suggest to install 1.6 version

@narayanacharya6
Copy link
Contributor

Getting the same error for:
Python 3.6.9
nlpaug 1.0.1
transformers 3.4.0
torch 1.6.0+cu101

Code Snippet:

text = 'The quick brown fox jumps over the lazy dog'
augInsert = naw.ContextualWordEmbsAug(model_path='bert-base-uncased', action="insert")

Error trace:

NameError                                 Traceback (most recent call last)
<ipython-input-11-6c148101e32b> in <module>()
      1 text = 'The quick brown fox jumps over the lazy dog'
----> 2 augInsert = naw.ContextualWordEmbsAug(model_path='bert-base-uncased', action="insert")
      3 augSubstitute = naw.ContextualWordEmbsAug(model_path='bert-base-uncased', action="substitute")
      4 augmented_text1 = augInsert.augment(text)
      5 augmented_text2 = augSubstitute.augment(text)

3 frames
/usr/local/lib/python3.6/dist-packages/nlpaug/augmenter/word/context_word_embs.py in __init__(self, model_path, action, temperature, top_k, top_p, name, aug_min, aug_max, aug_p, stopwords, device, force_reload, optimize, stopwords_regex, verbose, silence)
     97         self.model = self.get_model(
     98             model_path=model_path, device=device, force_reload=force_reload, temperature=temperature, top_k=top_k,
---> 99             top_p=top_p, optimize=optimize, silence=silence)
    100         # Override stopwords
    101         if stopwords is not None and self.model_type in ['xlnet', 'roberta']:

/usr/local/lib/python3.6/dist-packages/nlpaug/augmenter/word/context_word_embs.py in get_model(cls, model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)
    423     def get_model(cls, model_path, device='cuda', force_reload=False, temperature=1.0, top_k=None, top_p=0.0,
    424                   optimize=None, silence=True):
--> 425         return init_context_word_embs_model(model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)

/usr/local/lib/python3.6/dist-packages/nlpaug/augmenter/word/context_word_embs.py in init_context_word_embs_model(model_path, device, force_reload, temperature, top_k, top_p, optimize, silence)
     31         model = nml.Roberta(model_path, device=device, temperature=temperature, top_k=top_k, top_p=top_p, silence=silence)
     32     elif 'bert' in model_path.lower():
---> 33         model = nml.Bert(model_path, device=device, temperature=temperature, top_k=top_k, top_p=top_p, silence=silence)
     34     elif 'xlnet' in model_path.lower():
     35         model = nml.XlNet(model_path, device=device, temperature=temperature, top_k=top_k, top_p=top_p, optimize=optimize,

/usr/local/lib/python3.6/dist-packages/nlpaug/model/lang_models/bert.py in __init__(self, model_path, temperature, top_k, top_p, device, silence)
     30         self.model_path = model_path
     31 
---> 32         self.tokenizer = AutoTokenizer.from_pretrained(model_path)
     33         self.mask_id = self.token2id(self.MASK_TOKEN)
     34         self.pad_id = self.token2id(self.PAD_TOKEN)

NameError: name 'AutoTokenizer' is not defined

@narayanacharya6
Copy link
Contributor

If it helps, doing this in the same notebook works:

from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

Not sure what the underlying problem could be.

@makcedward
Copy link
Owner

@narayanacharya6
I cannot reproduce this issue. I changed library import statements, see if it helped. Does not push pip yet, only available on master branch.

@narayanacharya6
Copy link
Contributor

Probably something off with Google Colab. I tried to run it locally on my machine and it works just fine.
I think this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants