-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9445 Add support for case insensitive regex searches in QueryParser #1708
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to add the support.
I wonder if the parsing should be more strict and only matches if there is a separator or ends after the i
? That would be "less" breaking even though I think we should consider this change as breaking anyway. Maybe we can introduce it in 8x under a flag ? Is it what you had in mind ?
Yep - end-of-string or space should be required after the end of the regex. Maybe brackets too for Boolean logic? I hadn't considered adding a flag for 8.x. If we do I'd prefer to see it support the new behaviour by default - the rationale being that it is better to give an escape hatch to the few admins concerned about BWC for an edge case than continue the legacy of silent failures for potentially many regex searchers assuming |
Progress update - I'm struggling a bit with how to make the parser stricter i.e. ensuring there's a space between |
@jimczi What is the behaviour for a non-match? |
I'd prefer that we throw an error if any of the character attached to a regexp is not recognized. |
OK. @romseygeek suggested the BWC flag is called "allow_modifiers" and, if false, legacy behaviour is used ie there would be no errors for characters after trailing |
@jimczi The TL/DR is I think it's going to be too hard to implement the stricter parsing logic. I spoke with @romseygeek and we couldn't see a neat way that the string after the closing Eager option - pass everything immediately after closing
|
…arser using the standard /.../i regex syntax
This PR uses the standard /.../i regex syntax to denote case insensitive queries, exposing the underlying case insensitive regex support added in LUCENE-9386
This could be considered a breaking change if users had a regex immediately followed by the letter
i
but I imagine a case insensitive search would have been the intention of the searcher all along.Jira issue: https://issues.apache.org/jira/browse/LUCENE-9445