ner.add_label to existing model causes segmentation fault: 11 #2769

iperera · 2018-09-17T21:46:54Z

I was getting intermittent segmentation faults when training a new entity type, and so I thought I'd update spaCy to see if that helped. Unfortunately, now I get a segfault every single time, except not in training, but on adding entity types.

How to reproduce the behaviour

Follow the spaCy/examples/training/train_new_entity_type.py example with the existing model 'en'. Segmentation fault occurs when adding a new entity label (ner.add_label(label)).

Your Environment

spaCy version: 2.1.0a1
Platform: Darwin-17.7.0-x86_64-i386-64bit
Python version: 3.7.0
Models: en

I've attached the segfault log.

segfault.txt

ines · 2018-09-18T15:52:59Z

Thanks for the report. Are you able to share the examples you used and/or the labels you're adding? And do you have a reproducible example? Segfaults like this are always tricky to debug, so the more specific examples we have, the better.

iperera · 2018-09-18T16:59:34Z

The minimal reproducible example is the train_new_entity_type.py example script with the 'en' model loaded, with no other changes. That script adds the 'ANIMAL' entity tag. Note that this particular error is only with the nightly build.

The intermittent segmentation faults I referenced happened with other data on the release build, but that issue has been mentioned in the past and is still open - #1969

free-variation · 2018-09-19T22:23:30Z

@iperera do you get the segfault even when just running that example file? It ran fine for me, on a mac using Python 3.7.

iperera · 2018-09-19T22:26:06Z

Only when specifying an existing model to add to. If I start with a blank model, it runs fine for me.

nyejon · 2018-11-22T07:28:41Z

I also get a segmentation fault using the standard training code when I try add a label to the NER with ner.add_label("FEATURE")

This is on the latest nightly build

def main(model=None, new_model_name='animal', output_dir=None, n_iter=10):
    """Set up the pipeline and entity recognizer, and train the new entity."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank('en')  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy

    print(nlp.pipe_names)
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe('ner')

    print("Adding labels")
    for label in LABELS:
        print(label)
        ner.add_label(label)   # <- Segfaults here
        print(label)

    print("Beginning training")
    if model is None:
        optimizer = nlp.begin_training()
    else:
        # Note that 'begin_training' initializes the models, so it'll zero out
        # existing entity types.
        optimizer = nlp.entity.create_optimizer()

    # get names of other pipes to disable them during training
    print("Disabling pipes")
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(TRAIN_DATA, size=compounding(8., 64., 1.001))
            # print(f'Number of batches: {len(batches)}')
            for batch_num, batch in enumerate(batches):
                texts, annotations = zip(*batch)
                if batch_num % 1000 == 0:
                    print(f"Batch {batch_num}")
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
                           losses=losses)
            print('Losses', losses)

    # test the trained model
    test_text = 'Do you like horses?'
    doc = nlp(test_text)
    print("Entities in '%s'" % test_text)
    for ent in doc.ents:
        print(ent.label_, ent.text)

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.meta['name'] = new_model_name  # rename model
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        doc2 = nlp2(test_text)
        for ent in doc2.ents:
            print(ent.label_, ent.text)

if __name__ == '__main__':
    main(model='en_core_web_md', new_model_name="feature", output_dir="./new_model", n_iter=1)

ines · 2018-11-26T12:21:04Z

@nyejon Thanks for the example! I just tested it on the very latest state of develop and can confirm the segfault.

Here's the minimal reproducable version:

import spacy

nlp = spacy.load("en_core_web_sm")
ner = nlp.get_pipe("ner")
ner.add_label("FEATURE")

honnibal · 2018-12-10T12:46:58Z

Fixed 🎉 160b55c

lock · 2019-01-09T13:12:38Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added bug Bugs and behaviour differing from documentation feat / ner Feature: Named Entity Recognizer more-info-needed This issue needs more information labels Sep 18, 2018

no-response bot removed the more-info-needed This issue needs more information label Sep 18, 2018

ines closed this as completed Nov 26, 2018

ines reopened this Nov 26, 2018

honnibal closed this as completed Dec 10, 2018

lock bot locked as resolved and limited conversation to collaborators Jan 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ner.add_label to existing model causes segmentation fault: 11 #2769

ner.add_label to existing model causes segmentation fault: 11 #2769

iperera commented Sep 17, 2018

ines commented Sep 18, 2018

iperera commented Sep 18, 2018 •

edited

Loading

free-variation commented Sep 19, 2018

iperera commented Sep 19, 2018

nyejon commented Nov 22, 2018 •

edited

Loading

ines commented Nov 26, 2018 •

edited

Loading

honnibal commented Dec 10, 2018

lock bot commented Jan 9, 2019

ner.add_label to existing model causes segmentation fault: 11 #2769

ner.add_label to existing model causes segmentation fault: 11 #2769

Comments

iperera commented Sep 17, 2018

How to reproduce the behaviour

Your Environment

ines commented Sep 18, 2018

iperera commented Sep 18, 2018 • edited Loading

free-variation commented Sep 19, 2018

iperera commented Sep 19, 2018

nyejon commented Nov 22, 2018 • edited Loading

ines commented Nov 26, 2018 • edited Loading

honnibal commented Dec 10, 2018

lock bot commented Jan 9, 2019

iperera commented Sep 18, 2018 •

edited

Loading

nyejon commented Nov 22, 2018 •

edited

Loading

ines commented Nov 26, 2018 •

edited

Loading