Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminals cannot be transformed in tree-less LALR? #818

Closed
qwertystop opened this issue Jan 25, 2021 · 6 comments
Closed

Terminals cannot be transformed in tree-less LALR? #818

qwertystop opened this issue Jan 25, 2021 · 6 comments
Labels

Comments

@qwertystop
Copy link

Describe the bug

I've written a transformer with the line DECIMAL = float (from %import common.DECIMAL in the grammar) and it seemed to work fine post-transforming an Earley-parsed tree – that is, the relevant locations in the transformed tree would be 48.0 rather than Token('DECIMAL', '48.0') and similar.

I've written another transformer, this one for a grammar that I was able to make run in LALR. Therefore, I tried a tree-less parse. Now I get ValueError: Callbacks must return a token (returned 48.0). To make the grammar work for a tree-less parse, I must wrap all my terminals in rules just to be able to transform them.

This seems to me unnecessarily tedious, and expands the size of the grammar unhelpfully. Also, transformer callbacks for rules are more long-winded to write – you can't do decimal = float, at minimum you have to do decimal = lambda self, n: return float(n[0])

To Reproduce

from lark import Lark, Transformer
grammar = """
    start: NUM
    NUM: /\d+/
"""

class T(Transformer):
    NUM = int

T().transform(Lark(grammar, parser="lalr").parse("32"))
# returns Tree('start', [32])
Lark(grammar, parser="lalr", transformer=T()).parse("32")
# raises ValueError("Callbacks must return a token (returned 32)")
@erezsh erezsh added the bug label Jan 25, 2021
@MegaIng
Copy link
Member

MegaIng commented Jan 25, 2021

@erezsh The problem is that the TERMINAL callbacks get added to lexer_callbacks, which of course happens before the parser see them. This also means the Transformer can actually change how to parser deals with the terminals, since it can change their type. This also means the this will be a backwards incompatible fix. I am not even sure if there is a clean way of fixing this.

@erezsh
Copy link
Member

erezsh commented Jan 26, 2021

I don't think there's a way to fix this that makes sense.

We can process the terminals after the parser, but the user can do that manually as well, and it's better to be transparent about this.

@erezsh
Copy link
Member

erezsh commented Jan 26, 2021

I suppose another approach would be to replace the return values with a proxy object, that also maintains the token type. But that sounds a little too "magical".

@MegaIng
Copy link
Member

MegaIng commented Jan 26, 2021

After looking at the code, I think it might make sense to handle the Terminal Transformer like we do the rules transformer: Here similar to what happens a few line later. We would have to extend the callback generation to also add the Terminals, but that shouldn't be to hard.

@erezsh
Copy link
Member

erezsh commented Jan 26, 2021

@MegaIng You could be right, that might work.

MegaIng added a commit to MegaIng/lark that referenced this issue Feb 6, 2021
@erezsh
Copy link
Member

erezsh commented Jul 1, 2024

Forgot to close this. This issue is already fixed.

@erezsh erezsh closed this as completed Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants