Skip to content

Accessing capture groups of regular expressions? #905

Answered by erezsh
cubinator asked this question in Q&A
Discussion options

You must be logged in to vote

Lark already uses named groups to retrieve the token type. Unfortunately, the re module doesn't have support for nested groups.

My suggestion is to run the regex again on the token, post lex (or post parse), to extract the named groups. That might add around 10% to the parse time, which isn't that bad.

You could also write your own lexer, if you needed. Provide it to Lark with the lexer argument (https://lark-parser.readthedocs.io/en/latest/classes.html#lark.Lark).

@MegaIng

We could, in theory, add this as a feature to Lark. It will allow you to specify in the grammar which regexes should be considered with their groups, and we can re-evaluate them whenever they get matched. But I don't k…

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@cubinator
Comment options

Comment options

You must be logged in to vote
1 reply
@cubinator
Comment options

Answer selected by cubinator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants