-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NER training in command line, the final model send everything back as an entities. #2185
Comments
3)And also want to know if we can use the en_vectors_web_lg for the cli train command, because when i use it with a -v flag the python stops and crashes for the 1st iteration 4)As the train data has a format of json , how should the dev_data be arranged in the json file The Error : C:\Users\karth>python -m spacy train en "C:\Users\karth\AppData\Local\Programs\Python\Python35\Lib\site-packages\law_md3" "E:\Office files\Python\Output\section.json" "E:\Office files\Python\Output\train.json" -n 50 -P -T -v "C:\Users\karth\AppData\Local\Programs\Python\Python35\Lib\site-packages\en_vectors_web_lg\en_vectors_web_lg-2.0.0" Thanks, any help would mean alot windows 10 |
aij-wikiner-wp3-es-dev-iob.zip I'm pretty sure something's gone wrong with your pre-processing, and your data file is not correct. I've attached an IOB format data file that's likely to be easier to produce. Let's say you extract the file to your /tmp directory. Here's an example command that converts it: unzip /tmp/aij-wikiner-wp3-es-dev-iob.zip
python -m spacy convert aij-wikiner-wp3-es-dev-iob.zip
spacy convert /tmp/aij-wikiner-wp3-es-dev-iob.iob /tmp This should give you a file
|
The hyper-parameters are currently set via the environment variables because passing the configuration through to the arbitrary places they have to be read is difficult, and there's constantly new hyper-parameters to expose. You could try writing to the |
Thanks honnibal, your comment was very helpful and I mange to train the model. Now it works normal and not sending everything back as an entity. It was a problem in my json file. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hi guys and thanks for your fantastic job, I have a problem regarding the NER training via command line which I will explain in the following. I train the NER with new entities in command line exact the same as explained in spaCy document. I am only interested to trained the NER. Both the training and the dev input are json format (as they should be).
The command I use is as follows:
python -m spacy train en ../Desktop/Spacy/Model-Train ../Desktop/Spacy/242018/gutenberg_devu.txt.json ../Desktop/242018/gutenbergu.txt.json -n 20
-P -T
the out put is:
dropout_from = 0.2 by default
dropout_to = 0.2 by default
dropout_decay = 0.0 by default
batch_from = 1 by default
batch_to = 16 by default
batch_compound = 1.001 by default
max_doc_len = 5000 by default
beam_width = 1 by default
beam_density = 0.0 by default
learn_rate = 0.001 by default
optimizer_B1 = 0.9 by default
optimizer_B2 = 0.999 by default
optimizer_eps = 1e-08 by default
L2_penalty = 1e-06 by default
grad_norm_clip = 1.0 by default
parser_hidden_depth = 1 by default
parser_maxout_pieces = 2 by default
token_vector_width = 128 by default
hidden_width = 200 by default
embed_size = 7000 by default
history_feats = 0 by default
history_width = 0 by default
Itn. P.Loss N.Loss UAS NER P. NER R. NER F. Tag % Token %
0 0.000 0.000 0.000 1.913 20.379 3.498 0.000 100.000 17850.40.0
1 0.000 0.000 0.000 1.692 6.833 2.712 0.000 100.000 18645.00.0
2 0.000 0.000 0.000 2.290 17.970 4.063 0.000 100.000 17854.40.0
3 0.000 0.000 0.000 1.708 19.550 3.141 0.000 100.000 18009.60.0
4 0.000 0.000 0.000 0.902 11.493 1.672 0.000 100.000 14274.50.0
5 0.000 0.000 0.000 1.772 15.679 3.183 0.000 100.000 14633.40.0
6 0.000 0.000 0.000 2.617 42.299 4.930 0.000 100.000 15909.50.0
7 0.000 0.000 0.000 1.394 2.686 1.835 0.000 100.000 17427.70.0
8 0.000 0.000 0.000 1.308 4.502 2.027 0.000 100.000 16058.40.0
9 0.000 0.000 0.000 2.867 10.821 4.533 0.000 100.000 12611.60.0
10 0.000 0.000 0.000 0.904 4.502 1.506 0.000 100.000 17244.90.0
11 0.000 0.000 0.000 1.023 9.400 1.846 0.000 100.000 16253.10.0
12 0.000 0.000 0.000 2.470 25.592 4.506 0.000 100.000 18050.80.0
13 0.000 0.000 0.000 0.935 3.476 1.473 0.000 100.000 17420.50.0
14 0.000 0.000 0.000 1.372 6.714 2.279 0.000 100.000 16866.80.0
15 0.000 0.000 0.000 1.786 14.100 3.170 0.000 100.000 15634.50.0
16 0.000 0.000 0.000 2.234 13.981 3.852 0.000 100.000 17915.30.0
17 0.000 0.000 0.000 0.693 3.081 1.131 0.000 100.000 17383.00.0
18 0.000 0.000 0.000 1.540 3.002 2.036 0.000 100.000 18024.50.0
19 0.000 0.000 0.000 2.552 27.567 4.672 0.000 100.000 17402.80.0
Saving model...
but when I load the final model is terrible and send me back every token in the text file is an entities.
If some one had experience something similar or just have a quick look and find it out where is my problem, it is really appreciated.
Just add if I run the code in "Training the named entity recognizer" mentioned in documentation and create a model (train a new type entity orating to existing one) everything is fine and the model works totally fine, as several times mentioned on other tickets, that one do not fit huge data set and easily crashed when the number of tokens in training data is more than few hundred thousands. So the command line above do not crash with large training data but has a very strange behaviour!
A part of the results is like this:
NAME crisis with the
NAME positions
NAME . therefore
NAME in portfolios
NAME .
NAME portfolios are
everything is entity (even the punctuation)!!!
Additional if someone explain what are these abbreviation in the output it also helps a lot:
P.Loss N.Loss UAS NER P. NER R. NER F. Tag % Token %
(take a look at output and all the first three columns are zero, Is it normal? )
I run python 2.7.10
on Mac with
Spacy 2.0.9
Thanks in advance for your help.
The text was updated successfully, but these errors were encountered: