Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Sample larger than population or is negative #38

Closed
shashankbansal6 opened this issue Sep 3, 2019 · 8 comments
Closed

ValueError: Sample larger than population or is negative #38

shashankbansal6 opened this issue Sep 3, 2019 · 8 comments

Comments

@shashankbansal6
Copy link

Hi,

I have a small dataset that I am trying to augment. For some of the questions, I am getting the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-337-336aea02b7a2> in <module>
      2 print(len(text))
      3 aug = naw.BertAug(action="insert")
----> 4 augmented_text = aug.augment(text)
      5 print("Original:")
      6 print(text)

~/anaconda3/lib/python3.7/site-packages/nlpaug/base_augmenter.py in augment(self, data)
     69 
     70         if self.action == Action.INSERT:
---> 71             return self.insert(data)
     72         elif self.action == Action.SUBSTITUTE:
     73             return self.substitute(data)

~/anaconda3/lib/python3.7/site-packages/nlpaug/augmenter/word/bert.py in insert(self, data)
     85         for aug_idx in aug_idxes:
     86             results.insert(aug_idx, nml.Bert.MASK)
---> 87             new_word = self.sample(self.model.predict(results, nml.Bert.MASK, self.aug_n), 1)[0]
     88             results[aug_idx] = new_word
     89 

~/anaconda3/lib/python3.7/site-packages/nlpaug/base_augmenter.py in sample(cls, x, num)
    109     @classmethod
    110     def sample(cls, x, num):
--> 111         return random.sample(x, num)
    112 
    113     def generate_aug_cnt(self, size, aug_p=None):

~/anaconda3/lib/python3.7/random.py in sample(self, population, k)
    319         n = len(population)
    320         if not 0 <= k <= n:
--> 321             raise ValueError("Sample larger than population or is negative")
    322         result = [None] * k
    323         setsize = 21        # size of a small set minus size of an empty list

ValueError: Sample larger than population or is negative

After some research, I came across this https://stackoverflow.com/questions/20861497/sample-larger-than-population-in-random-sample-python
but I am still not sure what exactly the issue is. It works sometimes but other times it returns this error. Is it something to do with my questions? Is there a specific format I need to follow for the questions?

Any help would be much appreciated.

@makcedward
Copy link
Owner

It happens when possible output (output from predict function) is less than top_n (select best n element).

Can you share your input (e.g. text = ?)?

@shashankbansal6
Copy link
Author

shashankbansal6 commented Sep 3, 2019

One of the questions was text = "If I enroll in the ESPP, when will my offering begin and the price set?"
and I believe this only happens for action=insert

You might be able to regenerate the issue using the following code:

text = "If I enroll in the ESPP, when will my offering begin and the price set?"
for i in range(3):
    aug = naw.BertAug(action="insert")
    augmented_text = aug.augment(text)
    print("Original:")
    print(text)
    print("Augmented Text:")
    print(augmented_text)

@jxy-001
Copy link

jxy-001 commented Sep 4, 2019

I also meet the error,have you solved it?

@makcedward
Copy link
Owner

I also meet the error,have you solved it?

Fixed and merged to master branch (Not ready in pip install yet)

@jxy-001
Copy link

jxy-001 commented Sep 4, 2019

I did not use pip ,directly put your folders in my project

@makcedward
Copy link
Owner

I did not use pip ,directly put your folders in my project

How is the result after using the latest build?

@jxy-001
Copy link

jxy-001 commented Sep 4, 2019

ValueError Traceback (most recent call last)
in
1 if par_eda == 1: # use eda to operate sentences when par_eda is true
2 for i in range(len(dat_plus['title_text'])):
----> 3 dat_plus['title_text'][i] = copy.deepcopy(eda_text(dat_plus['title_text'][i]))
4 dat_plus['title_text'][i] = "".join(dat_plus['title_text'][i])

in eda_text(text)
23 if len(zz) <= 500:
24 #print(len(zz))
---> 25 tmp_text = aug_text(tmp_text)
26 # conbine prior 3 sentences and rest sentences
27 for j in range(len(text)-3):

in aug_text(text)
1 def aug_text(text):
----> 2 text = aug.augment(text)
3 return(text)

/home/user5/Desktop/bert_fot_new/nlpaug/base_augmenter.py in augment(self, data)
69
70 if self.action == Action.INSERT:
---> 71 return self.insert(data)
72 elif self.action == Action.SUBSTITUTE:
73 return self.substitute(data)

/home/user5/Desktop/bert_fot_new/nlpaug/augmenter/word/bert.py in insert(self, data)
88 for aug_idx in aug_idxes:
89 results.insert(aug_idx, nml.BertDeprecated.MASK)
---> 90 new_word = self.sample(self.model.predict(results, nml.BertDeprecated.MASK, self.aug_n), 1)[0]
91 results[aug_idx] = new_word
92

/home/user5/Desktop/bert_fot_new/nlpaug/base_augmenter.py in sample(cls, x, num)
109 @classmethod
110 def sample(cls, x, num):
--> 111 return random.sample(x, num)
112
113 def generate_aug_cnt(self, size, aug_p=None):

/usr/local/anaconda3/lib/python3.7/random.py in sample(self, population, k)
319 n = len(population)
320 if not 0 <= k <= n:
--> 321 raise ValueError("Sample larger than population or is negative")
322 result = [None] * k
323 setsize = 21 # size of a small set minus size of an empty list

ValueError: Sample larger than population or is negative

@jxy-001
Copy link

jxy-001 commented Sep 4, 2019

I think should change sample to choice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants