-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Norwegian Bokmål model handles lemmatization process for NOUNs with incorrectly results #5658
Comments
Thanks for the report! I can replicate this and it looks like a bug in the lemmatizer. The 2.3.0 models include more consistent tag maps with morphological features from the UD corpora, but it looks like the presence of the morphological features triggered some older English-specific code that skips lemmatization for singular nouns, which is clearly a bug here. We'll look into a fix! |
Great 👍 |
Okay, this should be fixed in the upcoming v2.3.1 by #5663. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Norwegian Bokmål model 2.3.0 handles lemmatization process for NOUNs with incorrectly results.
For example in the sentence
Formuesskatten er en skatt som utlignes på grunnlag av nettoformuen din.
not correctly determined lemma ofFormuesskatten
--> lemmaFormuesskatten
, correct lemma isFormuesskatt
in this case.For the previous release of Norwegian Bokmål model 2.2.5 the lemma of
Formuesskatten
is correctly determined.This error affects the subsequent process of decomposition of compound NOUNs.
If correct then:
NOUN
formuesskatten
--> lemma -->formuesskatt
--> samset-leks +skattIf incorrect then:
NOUN
formuesskatten
--> lemma -->formuesskatten
--> samset-leks +skattenFor now I use older model (v2.2.5) for such kind of tasks.
How to reproduce the behaviour
Result:
Your Environment
The text was updated successfully, but these errors were encountered: