Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: collect and print placeholders in Po files for better translation context #1965

Merged

Conversation

timofei-iatsenko
Copy link
Collaborator

Description

This feature serves similar purpose as in this PR #1874.

If the message contains unnamed placeholders, such as {0} print theirs values into PO comments, so translators got an idea what this placeholder is about.

t`Hello ${user.name} ${value}`

This will be extracted as

Before:

msgid "Hello {0} {value}"

After:

#. ph: {0} = user.name
msgid "Hello {0} {value}"

This benefit of this solution is that developers can reuse more translations, because different placeholder name will not create different translations keys.

Also now placeholder metadata is available for formatters, so users can implement theirs own logic for placeholders (for example replace them to values)

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Examples update

Fixes # (issue)

Checklist

  • I have read the CONTRIBUTING and CODE_OF_CONDUCT docs
  • I have added tests that prove my fix is effective or that my feature works
  • I have added the necessary documentation (if appropriate)

Copy link

vercel bot commented Jun 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
js-lingui ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 27, 2024 10:13am

Copy link

github-actions bot commented Jun 24, 2024

size-limit report 📦

Path Size
./packages/core/dist/index.mjs 2.86 KB (0%)
./packages/detect-locale/dist/index.mjs 723 B (0%)
./packages/react/dist/index.mjs 1.65 KB (0%)
./packages/remote-loader/dist/index.mjs 7.26 KB (0%)

@andrii-bodnar
Copy link
Contributor

Nice, I really like this feature, thanks!

What about renaming the ph: to the full word placeholders:? It's not so clear what the ph means. It would be better for translators to understand, as well as for AI (which will have it as context during pre-translation in Crowdin).

Also, can we have a test case for the strings with more than one placeholder?

@timofei-iatsenko
Copy link
Collaborator Author

No problem, can do. I'm also thinking about the case where few messages in the codebase with diffrent placeholders. Currently, it will take the only last or first (i didn't test) but probably it makes sense to gather an array of them and print it?

t`Hello ${user.name}`
t`Hello ${author.name}`
t`Hello ${moderator.name}`
#. ph: {0} = user.name | author.name | moderator.name
msgid "Hello {0}"
  1. how to format it, to make most of it? (so ai/translators would understand it). Note, here could be more complex expression, not only member expression. Ternaries, function calls, or even self invoking expressions (not sure anyone would do that, but it's possible)
  2. Does it really need to list all of them? Obviously, the code should dedupe the same entries

Btw the next improvement could be a flag which will completely opt-out from inline placeholder name populated from varaible names in favor of this feature for maximum translation reusability.

@andrii-bodnar
Copy link
Contributor

The following #. placeholder {0}: user.name | author.name | moderator.name looks like a good option. It would also be good to normalize the values (e.g. remove the line breaks) so that each placeholder comment is on a single line.

For a few placeholders per string, we can do something like this:

#. placeholder {0}: user.name
#. placeholder {1}: user.login

@timofei-iatsenko
Copy link
Collaborator Author

i don't really like "|" symbol, because the same symbol could be in the placeholder value itself. So it should more clear way to distinguish them.

t`Hello ${name || surname}`
t`Hello ${userName}`
#. ph: {0} = name || surname | userName
msgid "Hello {0}"

Also with line breaks with ternary:

t`Hello ${user
    ? user.name
    : 'User' }`
t`Hello ${userName}`

It seems clearing line breaks is unavoidable, because Po format is not allowing multiline comments

#. placeholder: {0} = user ? user.name : 'User' | userName
msgid "Hello {0}"

Maybe every entry on a new line?

#. ph: {0} = name || surname
#. ph: {0} = userName
#. ph: {1} = user.login
#. ph: {1} = user

Verbose but for 100% is clear where one placeholder value starts and ends.

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 90.69767% with 4 lines in your changes missing coverage. Please review.

Project coverage is 77.26%. Comparing base (dd43fb0) to head (b462451).
Report is 29 commits behind head on next.

Current head b462451 differs from pull request most recent head 435bcc3

Please upload reports for the commit 435bcc3 to get more accurate results.

Files Patch % Lines
...ackages/babel-plugin-extract-messages/src/index.ts 88.46% 1 Missing and 2 partials ⚠️
packages/cli/src/api/catalog/extractFromFiles.ts 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             next    #1965      +/-   ##
==========================================
+ Coverage   76.66%   77.26%   +0.59%     
==========================================
  Files          81       77       -4     
  Lines        2083     2208     +125     
  Branches      532      579      +47     
==========================================
+ Hits         1597     1706     +109     
- Misses        375      384       +9     
- Partials      111      118       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andrii-bodnar
Copy link
Contributor

I agree, the | character could be confusing. The last example with each entry on a new line probably looks best.

@timofei-iatsenko
Copy link
Collaborator Author

I'm also thinking that to give an overall context it's enough to show 2 - 3 placeholder values, so we can limit it to avoid bloating the file. From the other side, that could be a tool to quickly spot an issue with a message:

t`from ${myTrip.date}`
t`from ${myTrip.startCity}`
#. placeholder {0}:  myTrip.date
#. placeholder {0}:  myTrip.startCity <-- probably it should be different message created by different context
msgid "from {0}"

@andrii-bodnar
Copy link
Contributor

Yes, I think limiting the output would also be ok.

As for spotting the messages that should probably be used in different contexts, that would also be a nice feature. This is a common i18n problem. I think it could be presented in the CLI extract output, or it could be a completely separate CLI command, e.g. lingui lint. But this is definitely out of the current scope

@timofei-iatsenko
Copy link
Collaborator Author

@andrii-bodnar this one is ready

Copy link
Contributor

@andrii-bodnar andrii-bodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@andrii-bodnar andrii-bodnar merged commit 6230100 into lingui:next Jun 28, 2024
14 checks passed
@timofei-iatsenko timofei-iatsenko deleted the feature/print-placeholders-po branch June 28, 2024 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants