Forgiving syntax #1425

Stehlampe2020 · 2024-06-18T05:12:42Z

Suggestion
Provide a %catchall token_name syntax thingy that declares that everything that cannot be parsed as any defined token or terminal should be returned as the token token_name. This would allow for forgiving syntax definitions, that have a catchall that doesn't literally catch all characters but only everythin that would otherwise cause a syntax error.

Describe alternatives you've considered
I've tried using catchall definitions like a /.+/s at lower priority than all other tokens but that just returned the whole source code I gave it in single characters as that catchall token, fully ignoring all valid defined tokens. Even if it wouldn't have caused a syntax error without the catchall it just singled out every character.

Additional context
This could be useful for e.g. markup parsing. I'm creating my own markup language right now and having forgiving syntax that cannot error out, just return only-half-usable results would help a lot.

The text was updated successfully, but these errors were encountered:

MegaIng · 2024-06-18T06:42:20Z

If the catchall is /.+/, this would still happen, just starting at the first error. And using /./ with a low priority should work for the earley parser.

I don't think there is a viable way to implement catchall for lalr. If you have a concrete idea, feel free to suggest it. Note that backtracking is not an acceptable solution.

You are probably better of using the scan function I coded up in other issues, most recently in #1424 to find the parsable subsets, but something like html is just a terrible fit for lark anyway, since it's context sensitive and can't really be parsed correctly anyway.

erezsh · 2024-06-18T12:09:02Z

I like the idea. I think it's potentially possible to do, using the scan function @MegaIng wrote.

I hope one day we'll integrate a stable scan function into Lark, and then maybe it's worth revisiting this issue.

MegaIng · 2024-06-18T12:11:07Z

I hope one day we'll integrate a stable scan function into Lark, and then maybe it's worth revisiting this issue.

Currently working on it (after finishing a review of #1388), will open a PR today.

Stehlampe2020 added the enhancement label Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forgiving syntax #1425

Forgiving syntax #1425

Stehlampe2020 commented Jun 18, 2024 •

edited

Loading

MegaIng commented Jun 18, 2024

erezsh commented Jun 18, 2024

MegaIng commented Jun 18, 2024

Forgiving syntax #1425

Forgiving syntax #1425

Comments

Stehlampe2020 commented Jun 18, 2024 • edited Loading

MegaIng commented Jun 18, 2024

erezsh commented Jun 18, 2024

MegaIng commented Jun 18, 2024

Stehlampe2020 commented Jun 18, 2024 •

edited

Loading