Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glossary: prohibited initial character in a CSV element #12791

Open
2 tasks done
nijel opened this issue Oct 16, 2024 · 0 comments
Open
2 tasks done

Glossary: prohibited initial character in a CSV element #12791

nijel opened this issue Oct 16, 2024 · 0 comments
Assignees
Labels
bug Something is broken.
Milestone

Comments

@nijel
Copy link
Member

nijel commented Oct 16, 2024

Describe the issue

Machine translation using Amazon Translate crashes:

machinery[Amazon Translate]: Could not fetch translations: InvalidParameterValueException: An error occurred (InvalidParameterValueException) when calling the ImportTerminology operation: Text at line: 54 contains a prohibited initial character in a CSV element, which can lead to CSV Injection. Any cell in a CSV file may not start with the characters [=+@-], due to the potential risk for CSV injection. Please remove these characters : [=+@-] from beginning of CSV cell text.

I already tried

  • I've read and searched the documentation.
  • I've searched for similar filed issues in this repository.

Steps to reproduce the behavior

  1. Add term to glossary starting with =.
  2. Configure Amazon Translate.
  3. Try using automatic suggesstions.

Expected behavior

Suggestions should not fail.

Screenshots

No response

Exception traceback

No response

How do you run Weblate?

weblate.org service

Weblate versions

No response

Weblate deploy checks

No response

Additional context

There are two possible solutions to this:

  • Exclude such terms from CSV export.
  • Strip problematic characters.

There is an existing string filter for CSV implemented in exporters which could be reused:

def string_filter(self, text):
"""
Avoid Excel interpreting text as formula.
This is really bad idea, implemented in Excel, as this change leads to
displaying additional ' in all other tools, but this seems to be what most
people have gotten used to. Hopefully these characters are not widely used at
first position of translatable strings, so that harm is reduced.
Reverse for this is in weblate.formats.ttkit.CSVUnit.unescape_csv
"""
if text and text[0] in {"=", "+", "-", "@", "|", "%"}:
return "'{}'".format(text.replace("|", "\\|"))
return text

@nijel nijel added this to the 5.8.2 milestone Oct 16, 2024
@nijel nijel added the bug Something is broken. label Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken.
Projects
None yet
Development

No branches or pull requests

2 participants