-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'list index out of range' error for some batches when using minibatch #2946
Comments
Okay, I found out it's because of the hyphen seperated words like low-level, anti-war . |
One option would be to change the tokenization by customising the tokenizer. The tokenization rules will be serialized with your model, so your rules will be included when you save out the trained/updated model. Alternatively, you could also adjust your data and update the tags. Since there's a clear pattern here, you should probably be able to do this programmatically (split text with spaCy, find hyphenated tokens, check your tags at position Finally, when updating the model, you can also pass in |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
How to reproduce the behaviour
Error looks like this:
error is: list index out of range
error text
("The protest comes on the eve of the annual conference of Britain 's ruling Labor Party in the southern English seaside resort of Brighton.", "The International Atomic Energy Agency is to hold second day of talks in Vienna Wednesday on how to respond to Iran 's resumption of low-level uranium conversion.", 'Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country.', "The party is divided over Britain 's participation in the Iraq conflict and the continued deployment of 8,500 British troops in that country.")
error annotations
({'tags': ['DT', 'NN', 'VBZ', 'IN', 'DT', 'NN', 'IN', 'DT', 'JJ', 'NN', 'IN', 'NNP', 'POS', 'VBG', 'NNP', 'NNP', 'IN', 'DT', 'JJ', 'JJ', 'NN', 'NN', 'IN', 'NNP', '.']}, {'tags': ['DT', 'NNP', 'NNP', 'NNP', 'NNP', 'VBZ', 'TO', 'VB', 'JJ', 'NN', 'IN', 'NNS', 'IN', 'NNP', 'NNP', 'IN', 'WRB', 'TO', 'VB', 'TO', 'NNP', 'POS', 'NN', 'IN', 'JJ', 'NN', 'NN', '.']}, {'tags': ['NNS', 'IN', 'NNS', 'VBP', 'VBN', 'IN', 'NNP', 'TO', 'VB', 'DT', 'NN', 'IN', 'NNP', 'CC', 'VB', 'DT', 'NN', 'IN', 'JJ', 'NNS', 'IN', 'DT', 'NN', '.']}, {'tags': ['DT', 'NN', 'VBZ', 'VBN', 'IN', 'NNP', 'POS', 'NN', 'IN', 'DT', 'NNP', 'NN', 'CC', 'DT', 'JJ', 'NN', 'IN', 'CD', 'JJ', 'NNS', 'IN', 'DT', 'NN', '.']})
error is: list index out of range
error text
('They marched from the Houses of Parliament to a rally in Hyde Park.', 'Families of soldiers killed in the conflict joined the protesters who carried banners with such slogans as" Bush Number One Terrorist" and" Stop the Bombings."', 'Police put the number of marchers at 10,000 while organizers claimed it was 1,00,000.', 'The London march came ahead of anti-war protests today in other cities, including Rome, Paris, and Madrid.')
error annotations
({'tags': ['PRP', 'VBD', 'IN', 'DT', 'NNS', 'IN', 'NN', 'TO', 'DT', 'NN', 'IN', 'NNP', 'NNP', '.']}, {'tags': ['NNS', 'IN', 'NNS', 'VBN', 'IN', 'DT', 'NN', 'VBD', 'DT', 'NNS', 'WP', 'VBD', 'NNS', 'IN', 'JJ', 'NNS', 'IN', '
', 'NNP', 'NN', 'CD', 'NN', '
', 'CC', '', 'VB', 'DT', 'NNS', '.', '
']}, {'tags': ['NNS', 'VBD', 'DT', 'NN', 'IN', 'NNS', 'IN', 'CD', 'IN', 'NNS', 'VBD', 'PRP', 'VBD', 'CD', '.']}, {'tags': ['DT', 'NNP', 'NN', 'VBD', 'RB', 'IN', 'JJ', 'NNS', 'NN', 'IN', 'JJ', 'NNS', ',', 'VBG', 'NNP', ',', 'NNP', ',', 'CC', 'NNP', '.']})NO ERROR BATCH
no error text
('Iranian officials say they expect to get access to sealed sensitive parts of the plant Wednesday, after an IAEA surveillance system begins functioning.', 'Iran this week restarted parts of the conversion process at its Isfahan nuclear plant.')
no error annotations
({'tags': ['JJ', 'NNS', 'VBP', 'PRP', 'VBP', 'TO', 'VB', 'NN', 'TO', 'JJ', 'JJ', 'NNS', 'IN', 'DT', 'NN', 'NNP', ',', 'IN', 'DT', 'NNP', 'NN', 'NN', 'VBZ', 'VBG', '.']}, {'tags': ['NNP', 'DT', 'NN', 'VBD', 'NNS', 'IN', 'DT', 'NN', 'NN', 'IN', 'PRP$', 'NNP', 'JJ', 'NN', '.']})
Your Environment
The text was updated successfully, but these errors were encountered: