Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turkish problem #56

Closed
lchachurski opened this issue Feb 12, 2015 · 16 comments
Closed

Turkish problem #56

lchachurski opened this issue Feb 12, 2015 · 16 comments

Comments

@lchachurski
Copy link

Hey,

Seems like some Turkish letters fall into German:

Türkiye Kupasi => tuerkiye-kupasi

While it should be turkiye-kupasi. Same problem with ö which should fall into o.

Not sure, but it seems Turkish should go to a ruleset.

@mac2000
Copy link
Contributor

mac2000 commented Feb 12, 2015

We need someone who knows German here probably, in my opinion ö should be transliterated to o, so with ü. So probably changes probably made here...

@lchachurski
Copy link
Author

Seems like German transliteration are fine according to this thread http://forum.grabaperch.com/forum/08-29-2014-encoding-of-german-umlaute-in-slugsfilenames

@mac2000
Copy link
Contributor

mac2000 commented Feb 12, 2015

At least right now you can add\override transliteration rules to fit your needs without waiting for new release like so:

$slugify->addRule('ö', 'o');
$slugify->addRule('ü', 'u');

actually I would prefer create new class that extends slugify and implements its interface and add this inside its constructor - so you can change/add custom rules whenever you need, and you always free to change slugify to anything else

@florianeckerstorfer
Copy link
Member

I'm speaking German and it's correct to transliterated ü to ue. We could probably add a Turkish ruleset.

🐯

On 12.02.2015, at 21:19, Marchenko Alexandr [email protected] wrote:

At least right now you can add\override transliteration rules to fit your needs without waiting for new release like so:

$slugify->addRule('ö', 'o');
$slugify->addRule('ü', 'u');
actually I would prefer create new class that extends slugify and implements its interface and add this inside its constructor - so you can change/add custom rules whenever you need, and you always free to change slugify to anything else


Reply to this email directly or view it on GitHub.

@joppuyo
Copy link

joppuyo commented May 6, 2015

Rules for Ä and Ö are wrong in Finnish, too. They should be fall back to A and O. I'd imagine same thing with Swedish and other nordic languages. AddRule method is a good workaround for now!

@mac2000
Copy link
Contributor

mac2000 commented May 6, 2015

Here is my proposal to fix this and all other such issues in future:

  1. Add locale property to slugify and fill it with current locale while instantiating object (or if given to constructor use it)
  2. Reuse ruleSets and add locale specific rules
  3. While instatiating object, decide is there rule set that is usable - if so - use it
  4. Profit. Slugify still will be as small as possible, we wont break interface and will be able to easily make it usable for everyone right out of the box without calling any additional methods

What do you guys thing about this?

It will be nice if you could provide some rules specific to languages you are know so we can make pull request with this proposal implemented

@florianeckerstorfer
Copy link
Member

Ok, here are some of my thoughts on this issue:

  • Let's call them languages, not locales. For example, German has multiple locals: German (Austria), German (Germany), …. They may have different rules about formatting currency or dates, but the special characters (and the associated transliterations) are the same for the language. Are there other languages where this is not the case?
  • Using the current implementation activateRuleset() takes the rules from the rule set and sets them into the "global" rules array. Thus, if there are conflicting rules the rules from the activated set they overwrite the default rules
  • If we have a rule set for each language, then (in the current implementation) defining a language would only activate the rules for this language. Rules for other languages would be not used
  • I think the current behaviour, just pass a string and Slugify returns a slug is the most common one. For example, if I design a system where the primary language is English, but occasional there are German umlauts in a slug I don't care if they are transliterated to ue or u.
  • Thus, calling Slugify without a language should still apply all rules according to a default rule set order
  • Calling Slugify with a language should give the language the highest priority, but if a rule is not defined for this language the rules from all other languages should still be applied.

As I see it we have two possibilities. Make the smallest changes possible to make this work for a new 1.x release or rethink how we could do this and make a 2.0 release. With the latter we could also try to tackle stuff like Persian.

@mac2000
Copy link
Contributor

mac2000 commented May 7, 2015

Lets even cal it ruleset instead of language or locale here is why:

User can use slugify in following scenarios:

  1. as is - nothing special here, all just works - we never ever should break this :)
  2. locale user set locale on server or application to desired one - slugify will determine it and apply according rulesets if any in constructor. We can substract language from locale and look for it, e.g. if system locale is ru_UA we can look for ru_UA and ru rulesets (like gettext does) and apply thous we have found - so we will cover most of future cases with strange transliteration rules.
  3. param user specified desired rulesets as constructor param - slugify will ignore system locale and apply specified rulesets

In all cases rulesets do the work, and it is great because rulesets allow us to not copy paste every letter for every language and "tune" transliteration to what user wants

In all cases we are not touching interface so should not break any existing code so probably there is no need for v2

@florianeckerstorfer
Copy link
Member

Sounds reasonable. Do you have any ideas on the actual implementation. As mentioned, currently there is one big array and if a ruleset is activated the rules of that set are copied into that big rules array. If we have a ruleset for every language we need to copy all the rules into the big rules array (and make sure that the given language/locale/ruleset is copied last (that is, it takes preference).

The other possible implementation (without making bigger changes to the code) is to iterate through the rulesets in each slugify() call and call strtr() for each.

Both will affect performance.

@mac2000
Copy link
Contributor

mac2000 commented May 8, 2015

Here is what I'm thingking about:

master...mac2000:master

  1. Replace german characters in global rules back to what they was so by default Ö will be slugified to o so we will fix Turkish, Finnish, Swedish, Ukrainian and Russian transliteration
  2. Introduce predefined ruleset de (and so on for each special non standard case)
  3. Introduce contructor parameter so user can specify what rulesets he want to use

Something like that i think can fix thous issues and all future ones

Actually we can even get rid of constructor param, so after instantiating slugify user will call activateRuleset('de') and all will work as expected

@florianeckerstorfer
Copy link
Member

I don't think it as easy as that.

First of all I think we should define what is the "default" case. For example, for a German speaker ä -> ae is the default case, while for a Turkish speaker ä -> a is the default case. I suggest using the list of Languages used on the Internet.

However, I think we should only move the conflicting transliterations into a ruleset. For example, ß is only used in German and should be transliterated to ss. Thus, we have to split the German transliterations into two parts, one in the default rules list and the other in a German-specific ruleset. That's fine for the short term, but I will think about a long term solution for v2.

Currently I am unsure if this change would be breaking compatibility. Since the very beginning of Slugify the German transliterations have also been the default and changing the default would break existing software.

@mac2000
Copy link
Contributor

mac2000 commented May 8, 2015

Oh, I see your point now - it seems that German is used much more that Turkish, Finnish etc together so German transliteration rules should take precedence.

But on the other hand - if we take German rules as default one - then we need to provide many rulesets for languages like Finnish, Turkish etc that will duplicate each other.

Ok, so I'm stuck now :)

At this moment it seems that the only way for people who do not like ä transliterated to ae is to provide custom transliteration rules - which by the way not such big deal, what I mean there is no difference will you provide custom rules, or parameter to constructor in each case - you should define somehow how slugify will transliterate your strings. So probably may be we should not fix anything here at all?

@florianeckerstorfer
Copy link
Member

Maybe we could define a no-umlaut ruleset? At least temporarily, so that Turkish and Finnish people don't need to add 6 separat rules?

@mac2000
Copy link
Contributor

mac2000 commented May 11, 2015

Here is what comes in my mind:

$slugify->activateRuleset('no-umlaut');

will solve problem and we definitely can do it, but I would still recommend to use:

$slugify->addRules(['ä' => 'a', 'Ö' => 'OE']);

for following reasons:

  • It is still one line of code
  • It can be more specific to fit my needs
  • I will not depend on ruleset that can be changed and even worth - removed so I will get exceptions

@joppuyo
Copy link

joppuyo commented Jun 8, 2015

I honestly feel a library like this should be focused on how slugified characters look, not how they sound. Downcoded characters look correct in every language as opposed to transliterated ones. Rather than assuming that everyone who uses the library is german by default, the transliteration could perhaps be explicitly enabled on per-language basis?

Just my two cents.

@florianeckerstorfer
Copy link
Member

This should be solved by #81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants