Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[de] wrong suggestion for "Dampfschiffahrtskapitän" #1369

Closed
tiff opened this issue Jan 20, 2019 · 3 comments
Closed

[de] wrong suggestion for "Dampfschiffahrtskapitän" #1369

tiff opened this issue Jan 20, 2019 · 3 comments
Labels

Comments

@tiff
Copy link
Member

tiff commented Jan 20, 2019

Not really common but I just wanted to show the capabilities of LanguageTool to someone and demoed it by using this word...

Dampfschiffahrtskapitän

Correct word would be "Dampfschifffahrtskapitän"

bildschirmfoto 2019-01-20 um 19 41 46

@janschreiber
Copy link
Contributor

Also see #725. Daniel improved the suggestion mechanism in summer 2017 (in response to my request back then), and this was a huge step forward. The gist of the solution was to take the suggestions for compounds from a static, finite but large list of words that "make sense" to humans. It worked out well.
But if the most likely suggestion is not in the list ("Dampfschifffahrtskapitän" with three f in this case), some algorithm is used that builds the compounds for suggestions on-the fly and fails miserably most of the time. For unknown but correct or almost correct words, we often suggest utter nonsense.
For example, I got the suggestions "Aluminiumwitwenkabel, Aluminiumkatzenkabel" for "Aluminiumlitzenkabel" today. These suggestions are of course valid compounds, but what do widows have to do with aluminum cables? They look semantically weird.

@danielnaber
Copy link
Member

I did some analysis: Dampf, schiff, ahrts, kapitän is one of several splits, but ahrts doesn't get the suggestion fahrts (in CompoundAwareHunspellRule#getCandidates()), as we use the standard suggestion algorithm, and that's not prepared to work well with in-compound words with the infix-s. Maybe we can find a hack to improve this.

danielnaber added a commit that referenced this issue Jan 21, 2019
…ases that have a compound part ending in "s" like "fahrts" (#1369)
@danielnaber
Copy link
Member

Fixed with a rather specific hack (but not just hard-coding this word).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants