Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List index out of range error in PoFileParser._add_message #1134

Open
gabe-sherman opened this issue Sep 24, 2024 · 8 comments
Open

List index out of range error in PoFileParser._add_message #1134

gabe-sherman opened this issue Sep 24, 2024 · 8 comments

Comments

@gabe-sherman
Copy link
Contributor

The below code triggers a list index out of range error when provided a malformed input. This occurs in _add_message at line 235 in babel/messages/pofile.py.

import sys
import io
import babel.messages.pofile
import babel

def main():
    input = io.StringIO(open(sys.argv[1], "r").read())
    babel.messages.pofile.read_po(input)
    

if __name__ == "__main__":
    main()

Version

babel version 2.16.0

POC File

https:/FuturesLab/POC/blob/main/babel/poc-01

How to trigger:

python filename.py poc-01

Trace report

Traceback (most recent call last):
  File "rep.py", line 11, in <module>
    main()
  File "rep.py", line 7, in main
    babel.messages.pofile.read_po(input)
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 434, in read_po
    parser.parse(fileobj)
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 361, in parse
    self._finish_current_message()
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 250, in _finish_current_message
    self._add_message()
  File "/home/gabe/extras/atheris_venv/lib/python3.8/site-packages/babel/messages/pofile.py", line 235, in _add_message
    string = self.translations[0][1].denormalize()
IndexError: list index out of range
@tomasr8
Copy link
Member

tomasr8 commented Sep 24, 2024

Here's a smaller reproducer:

import io
from babel.messages.pofile import read_po

po = io.StringIO('msgid ""')
read_po(po)

Neither is a valid PO file so we should ideally raise a PoFileError in this case. @gabe-sherman would you like to open a PR?

@gabe-sherman
Copy link
Contributor Author

Hey Tomas! Thanks for the response. I don't have a deep understanding of the way the PO files should be processed so a full PR may be challenging, but if I'm able to get some time I'll certainly check it out :)

@tomasr8
Copy link
Member

tomasr8 commented Sep 24, 2024

I think checking if self.translations is empty inside _finish_current_message() and raising a PoFileError if that's the case might work. In any case, feel free to ask if you need help :)

@gabe-sherman
Copy link
Contributor Author

Sounds good, thanks! Should the process exit when this is detected or should we only raise the warning and leave the exception raising up to the value of self.abort_invalid.

@tomasr8
Copy link
Member

tomasr8 commented Sep 25, 2024

Right, there's this abort_invalid flag. I guess if it's set to True, we should just raise, and if not maybe we could insert a dummy translation, something like (0, '') (and emit a warning)?

@gabe-sherman
Copy link
Contributor Author

Is the invariant here that if the messages list has an element in it, the translations list should also be populated? Adding in this check at the beginning of _finish_current_message triggers this new warning on valid files in cases where both self.translations and self.messages are empty. However, when this check to the size of self.translations is moved inside the if self.messages: block it behaves as expected, although I haven't done robust testing yet. I just want to make sure we're not triggering this exception in cases where it's okay that the translations list is empty.

@tomasr8
Copy link
Member

tomasr8 commented Sep 25, 2024

To be honest, I'm not that familiar with the parser code either, but I think you are right. We should only raise/warn when we call _finish_current_message while self.messages is not empty and self.translations is, i.e. something like this:

def _finish_current_message(self) -> None:
    if self.messages:
        if not self.translations:
            # Handle error here
        self._add_message()

@gabe-sherman
Copy link
Contributor Author

Yea that's what I was thinking as well :). I'll do some testing on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants