Turkish problem #56

lchachurski · 2015-02-12T18:52:45Z

Hey,

Seems like some Turkish letters fall into German:

Türkiye Kupasi => tuerkiye-kupasi

While it should be turkiye-kupasi. Same problem with ö which should fall into o.

Not sure, but it seems Turkish should go to a ruleset.

mac2000 · 2015-02-12T19:21:51Z

We need someone who knows German here probably, in my opinion ö should be transliterated to o, so with ü. So probably changes probably made here...

lchachurski · 2015-02-12T20:13:47Z

Seems like German transliteration are fine according to this thread http://forum.grabaperch.com/forum/08-29-2014-encoding-of-german-umlaute-in-slugsfilenames

mac2000 · 2015-02-12T20:19:10Z

At least right now you can add\override transliteration rules to fit your needs without waiting for new release like so:

$slugify->addRule('ö', 'o');
$slugify->addRule('ü', 'u');

actually I would prefer create new class that extends slugify and implements its interface and add this inside its constructor - so you can change/add custom rules whenever you need, and you always free to change slugify to anything else

florianeckerstorfer · 2015-02-12T20:29:45Z

I'm speaking German and it's correct to transliterated ü to ue. We could probably add a Turkish ruleset.

🐯

On 12.02.2015, at 21:19, Marchenko Alexandr [email protected] wrote:

At least right now you can add\override transliteration rules to fit your needs without waiting for new release like so:

$slugify->addRule('ö', 'o');
$slugify->addRule('ü', 'u');
actually I would prefer create new class that extends slugify and implements its interface and add this inside its constructor - so you can change/add custom rules whenever you need, and you always free to change slugify to anything else

—
Reply to this email directly or view it on GitHub.

joppuyo · 2015-05-06T14:48:49Z

Rules for Ä and Ö are wrong in Finnish, too. They should be fall back to A and O. I'd imagine same thing with Swedish and other nordic languages. AddRule method is a good workaround for now!

mac2000 · 2015-05-06T19:03:31Z

Here is my proposal to fix this and all other such issues in future:

Add locale property to slugify and fill it with current locale while instantiating object (or if given to constructor use it)
Reuse ruleSets and add locale specific rules
While instatiating object, decide is there rule set that is usable - if so - use it
Profit. Slugify still will be as small as possible, we wont break interface and will be able to easily make it usable for everyone right out of the box without calling any additional methods

What do you guys thing about this?

It will be nice if you could provide some rules specific to languages you are know so we can make pull request with this proposal implemented

florianeckerstorfer · 2015-05-07T07:47:59Z

Ok, here are some of my thoughts on this issue:

Let's call them languages, not locales. For example, German has multiple locals: German (Austria), German (Germany), …. They may have different rules about formatting currency or dates, but the special characters (and the associated transliterations) are the same for the language. Are there other languages where this is not the case?
Using the current implementation activateRuleset() takes the rules from the rule set and sets them into the "global" rules array. Thus, if there are conflicting rules the rules from the activated set they overwrite the default rules
If we have a rule set for each language, then (in the current implementation) defining a language would only activate the rules for this language. Rules for other languages would be not used
I think the current behaviour, just pass a string and Slugify returns a slug is the most common one. For example, if I design a system where the primary language is English, but occasional there are German umlauts in a slug I don't care if they are transliterated to ue or u.
Thus, calling Slugify without a language should still apply all rules according to a default rule set order
Calling Slugify with a language should give the language the highest priority, but if a rule is not defined for this language the rules from all other languages should still be applied.

As I see it we have two possibilities. Make the smallest changes possible to make this work for a new 1.x release or rethink how we could do this and make a 2.0 release. With the latter we could also try to tackle stuff like Persian.

mac2000 · 2015-05-07T08:57:04Z

Lets even cal it ruleset instead of language or locale here is why:

User can use slugify in following scenarios:

as is - nothing special here, all just works - we never ever should break this :)
locale user set locale on server or application to desired one - slugify will determine it and apply according rulesets if any in constructor. We can substract language from locale and look for it, e.g. if system locale is ru_UA we can look for ru_UA and ru rulesets (like gettext does) and apply thous we have found - so we will cover most of future cases with strange transliteration rules.
param user specified desired rulesets as constructor param - slugify will ignore system locale and apply specified rulesets

In all cases rulesets do the work, and it is great because rulesets allow us to not copy paste every letter for every language and "tune" transliteration to what user wants

In all cases we are not touching interface so should not break any existing code so probably there is no need for v2

florianeckerstorfer · 2015-05-08T08:08:54Z

Sounds reasonable. Do you have any ideas on the actual implementation. As mentioned, currently there is one big array and if a ruleset is activated the rules of that set are copied into that big rules array. If we have a ruleset for every language we need to copy all the rules into the big rules array (and make sure that the given language/locale/ruleset is copied last (that is, it takes preference).

The other possible implementation (without making bigger changes to the code) is to iterate through the rulesets in each slugify() call and call strtr() for each.

Both will affect performance.

mac2000 · 2015-05-08T11:49:25Z

Here is what I'm thingking about:

master...mac2000:master

Replace german characters in global rules back to what they was so by default Ö will be slugified to o so we will fix Turkish, Finnish, Swedish, Ukrainian and Russian transliteration
Introduce predefined ruleset de (and so on for each special non standard case)
Introduce contructor parameter so user can specify what rulesets he want to use

Something like that i think can fix thous issues and all future ones

Actually we can even get rid of constructor param, so after instantiating slugify user will call activateRuleset('de') and all will work as expected

florianeckerstorfer · 2015-05-08T12:21:52Z

I don't think it as easy as that.

First of all I think we should define what is the "default" case. For example, for a German speaker ä -> ae is the default case, while for a Turkish speaker ä -> a is the default case. I suggest using the list of Languages used on the Internet.

However, I think we should only move the conflicting transliterations into a ruleset. For example, ß is only used in German and should be transliterated to ss. Thus, we have to split the German transliterations into two parts, one in the default rules list and the other in a German-specific ruleset. That's fine for the short term, but I will think about a long term solution for v2.

Currently I am unsure if this change would be breaking compatibility. Since the very beginning of Slugify the German transliterations have also been the default and changing the default would break existing software.

mac2000 · 2015-05-08T12:51:47Z

Oh, I see your point now - it seems that German is used much more that Turkish, Finnish etc together so German transliteration rules should take precedence.

But on the other hand - if we take German rules as default one - then we need to provide many rulesets for languages like Finnish, Turkish etc that will duplicate each other.

Ok, so I'm stuck now :)

At this moment it seems that the only way for people who do not like ä transliterated to ae is to provide custom transliteration rules - which by the way not such big deal, what I mean there is no difference will you provide custom rules, or parameter to constructor in each case - you should define somehow how slugify will transliterate your strings. So probably may be we should not fix anything here at all?

florianeckerstorfer · 2015-05-11T09:54:02Z

Maybe we could define a no-umlaut ruleset? At least temporarily, so that Turkish and Finnish people don't need to add 6 separat rules?

mac2000 · 2015-05-11T10:08:58Z

Here is what comes in my mind:

$slugify->activateRuleset('no-umlaut');

will solve problem and we definitely can do it, but I would still recommend to use:

$slugify->addRules(['ä' => 'a', 'Ö' => 'OE']);

for following reasons:

It is still one line of code
It can be more specific to fit my needs
I will not depend on ruleset that can be changed and even worth - removed so I will get exceptions

joppuyo · 2015-06-08T06:50:13Z

I honestly feel a library like this should be focused on how slugified characters look, not how they sound. Downcoded characters look correct in every language as opposed to transliterated ones. Rather than assuming that everyone who uses the library is german by default, the transliteration could perhaps be explicitly enabled on per-language basis?

Just my two cents.

florianeckerstorfer · 2015-11-07T10:41:03Z

This should be solved by #81

emnsen mentioned this issue Aug 25, 2015

Turkish and German characters Problem #74

Closed

This was referenced Oct 10, 2015

Swedish transliteration #80

Closed

Remove rules from Slugify and move them into files #81

Merged

florianeckerstorfer closed this as completed Nov 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turkish problem #56

Turkish problem #56

lchachurski commented Feb 12, 2015

mac2000 commented Feb 12, 2015

lchachurski commented Feb 12, 2015

mac2000 commented Feb 12, 2015

florianeckerstorfer commented Feb 12, 2015

joppuyo commented May 6, 2015

mac2000 commented May 6, 2015

florianeckerstorfer commented May 7, 2015

mac2000 commented May 7, 2015

florianeckerstorfer commented May 8, 2015

mac2000 commented May 8, 2015

florianeckerstorfer commented May 8, 2015

mac2000 commented May 8, 2015

florianeckerstorfer commented May 11, 2015

mac2000 commented May 11, 2015

joppuyo commented Jun 8, 2015

florianeckerstorfer commented Nov 7, 2015

Turkish problem #56

Turkish problem #56

Comments

lchachurski commented Feb 12, 2015

mac2000 commented Feb 12, 2015

lchachurski commented Feb 12, 2015

mac2000 commented Feb 12, 2015

florianeckerstorfer commented Feb 12, 2015

joppuyo commented May 6, 2015

mac2000 commented May 6, 2015

florianeckerstorfer commented May 7, 2015

mac2000 commented May 7, 2015

florianeckerstorfer commented May 8, 2015

mac2000 commented May 8, 2015

florianeckerstorfer commented May 8, 2015

mac2000 commented May 8, 2015

florianeckerstorfer commented May 11, 2015

mac2000 commented May 11, 2015

joppuyo commented Jun 8, 2015

florianeckerstorfer commented Nov 7, 2015