Skip to content
This repository has been archived by the owner on May 10, 2023. It is now read-only.

Concept to adapt to language handling #531

Closed
2 tasks done
MichaelKohler opened this issue Nov 11, 2021 · 5 comments
Closed
2 tasks done

Concept to adapt to language handling #531

MichaelKohler opened this issue Nov 11, 2021 · 5 comments

Comments

@MichaelKohler
Copy link
Member

MichaelKohler commented Nov 11, 2021

Currently there are quite a few languages enabled on Sentence Collector that are not enabled on Common Voice itself, and most of them also have never been requested to be added. This is due to the historical decision of enabling all ISO 639-1 codes in the Sentence Collector. As contributing to not-enabled languages basically could be considered "contributing into the void" until they actually get enabled in Common Voice, we need to take care of this at this point. This also has an impact on #492.

To have a better understanding, let's start with the following tasks:

  • How many, and which, languages have been contributed to in SC but are not enabled on CV?
  • Would that mean that we can (finally) let CV handle the languages, and not have our own language list?
@HarikalarKutusu
Copy link
Contributor

HarikalarKutusu commented Nov 12, 2021

Here are my answers:

Answer 1: I don't know the exact list, If you can provide it, I can check.
Answer 2: CV should handle these through Pontoon + all.json

The process for adding a language to CV is:

  • Asking for it
  • Having the frontend translated 90%+
  • Having 5000 sentences

I don't know the number in SC, but:

  • Pontoon has 151 teams (added there because people asked for that language)
  • CV has 152 locales (all.json), from which 85 are listed as "contributable" (contributable.json) thus people are able to record/listen as of now. The extra one compared to Pontoon is "eo" [edit: it is "en"]...

So, for people to translate UI / add those 5000 sentences, SC should use the largest one from all.json... [edit: I don't know about "eo"... => it is "en"]

But for people to contribute, while translating, they will also translate SC strings anyway...

@MichaelKohler
Copy link
Member Author

Thanks for that. From what I can see people have contributed to the following locales, which are not on Pontoon:

Possibly we should figure out if somebody wants to officially start kn and sa. The remaining languages would probably be ok to be disabled for now until somebody wants to officially start it.

Then at this point, I think we indeed could solely rely on the info the CV repository has and we wouldn't need our own list.

Pontoon has 151 teams (added there because people asked for that language)
CV has 152 locales (all.json), from which 85 are listed as "contributable" (contributable.json) thus people are abşe to record/listen as of now. The extra one compared to Pontoon is "eo" ...

Are you sure it's eo? I can see it at https://pontoon.mozilla.org/eo/common-voice/

@HarikalarKutusu
Copy link
Contributor

Are you sure it's eo? I can see it at https://pontoon.mozilla.org/eo/common-voice/

Oh my... I need to check my glasses. It is "en" of course :)

@MichaelKohler
Copy link
Member Author

Hah, that makes sense 😁 Thanks!

@MichaelKohler
Copy link
Member Author

For Sanskrit and Kannada I created https://discourse.mozilla.org/t/kannada-and-sanskrit/88555/2.

The remaining questions are IMHO answered for now and follow-up issues are created for those. Therefore I'm closing this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants