-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Norwegian Bokmål sentence segmentation not working #4401
Comments
The default sentence segmentation happens via the dependency parser – so this seems to have not worked in this example. You can always add the rule-based sentencizer to the pipeline, though, which uses a simpler strategy: https://spacy.io/usage/linguistic-features#sbd-component |
Also I'd like to point out that the name @ines : maybe to avoid confusion we should also rename the link in the docs? |
The anchors are the only references I kept to not break backwards compatibility – and rewriting anchor links is a pain 😞 Also, on the topic of the |
The training data only consists of individual sentences as documents, so the model seems to be determined to predict only one ROOT per text. I will work on it a bit to create some fake paragraphs so it can be retrained. |
Just released a new version of the |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Doing this:
Outputs:
Hei på deg. Jeg har det fint.
:(
There's some mention in #3082 that maybe sbd needs to be added to the pipeline? I'm not sure if it is, how I can do it, or if it should be there by default?
The text was updated successfully, but these errors were encountered: