Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ner.add_label to existing model causes segmentation fault: 11 #2769

Closed
iperera opened this issue Sep 17, 2018 · 8 comments
Closed

ner.add_label to existing model causes segmentation fault: 11 #2769

iperera opened this issue Sep 17, 2018 · 8 comments
Labels
bug Bugs and behaviour differing from documentation feat / ner Feature: Named Entity Recognizer

Comments

@iperera
Copy link

iperera commented Sep 17, 2018

I was getting intermittent segmentation faults when training a new entity type, and so I thought I'd update spaCy to see if that helped. Unfortunately, now I get a segfault every single time, except not in training, but on adding entity types.

How to reproduce the behaviour

Follow the spaCy/examples/training/train_new_entity_type.py example with the existing model 'en'. Segmentation fault occurs when adding a new entity label (ner.add_label(label)).

Your Environment

  • spaCy version: 2.1.0a1
  • Platform: Darwin-17.7.0-x86_64-i386-64bit
  • Python version: 3.7.0
  • Models: en

I've attached the segfault log.

segfault.txt

@ines ines added bug Bugs and behaviour differing from documentation feat / ner Feature: Named Entity Recognizer more-info-needed This issue needs more information labels Sep 18, 2018
@ines
Copy link
Member

ines commented Sep 18, 2018

Thanks for the report. Are you able to share the examples you used and/or the labels you're adding? And do you have a reproducible example? Segfaults like this are always tricky to debug, so the more specific examples we have, the better.

@iperera
Copy link
Author

iperera commented Sep 18, 2018

The minimal reproducible example is the train_new_entity_type.py example script with the 'en' model loaded, with no other changes. That script adds the 'ANIMAL' entity tag. Note that this particular error is only with the nightly build.

The intermittent segmentation faults I referenced happened with other data on the release build, but that issue has been mentioned in the past and is still open - #1969

@no-response no-response bot removed the more-info-needed This issue needs more information label Sep 18, 2018
@free-variation
Copy link
Contributor

@iperera do you get the segfault even when just running that example file? It ran fine for me, on a mac using Python 3.7.

@iperera
Copy link
Author

iperera commented Sep 19, 2018

Only when specifying an existing model to add to. If I start with a blank model, it runs fine for me.

@nyejon
Copy link

nyejon commented Nov 22, 2018

I also get a segmentation fault using the standard training code when I try add a label to the NER with ner.add_label("FEATURE")

This is on the latest nightly build

def main(model=None, new_model_name='animal', output_dir=None, n_iter=10):
    """Set up the pipeline and entity recognizer, and train the new entity."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank('en')  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy

    print(nlp.pipe_names)
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe('ner')

    print("Adding labels")
    for label in LABELS:
        print(label)
        ner.add_label(label)   # <- Segfaults here
        print(label)

    print("Beginning training")
    if model is None:
        optimizer = nlp.begin_training()
    else:
        # Note that 'begin_training' initializes the models, so it'll zero out
        # existing entity types.
        optimizer = nlp.entity.create_optimizer()

    # get names of other pipes to disable them during training
    print("Disabling pipes")
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(TRAIN_DATA, size=compounding(8., 64., 1.001))
            # print(f'Number of batches: {len(batches)}')
            for batch_num, batch in enumerate(batches):
                texts, annotations = zip(*batch)
                if batch_num % 1000 == 0:
                    print(f"Batch {batch_num}")
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
                           losses=losses)
            print('Losses', losses)

    # test the trained model
    test_text = 'Do you like horses?'
    doc = nlp(test_text)
    print("Entities in '%s'" % test_text)
    for ent in doc.ents:
        print(ent.label_, ent.text)

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.meta['name'] = new_model_name  # rename model
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        doc2 = nlp2(test_text)
        for ent in doc2.ents:
            print(ent.label_, ent.text)

if __name__ == '__main__':
    main(model='en_core_web_md', new_model_name="feature", output_dir="./new_model", n_iter=1)

@ines
Copy link
Member

ines commented Nov 26, 2018

@nyejon Thanks for the example! I just tested it on the very latest state of develop and can confirm the segfault.

Here's the minimal reproducable version:

import spacy

nlp = spacy.load("en_core_web_sm")
ner = nlp.get_pipe("ner")
ner.add_label("FEATURE")

@ines ines closed this as completed Nov 26, 2018
@ines ines reopened this Nov 26, 2018
@honnibal
Copy link
Member

Fixed 🎉 160b55c

@lock
Copy link

lock bot commented Jan 9, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / ner Feature: Named Entity Recognizer
Projects
None yet
Development

No branches or pull requests

5 participants