Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding Oring regexes #37

Merged
merged 11 commits into from
Apr 2, 2024
Merged

adding Oring regexes #37

merged 11 commits into from
Apr 2, 2024

Conversation

arblade
Copy link
Contributor

@arblade arblade commented Mar 23, 2024

PR Introduction

Currently, when there is an OR operator between fields concerned by regexes, the backend is returning an exception saying that Oring regexes are not supported.

Issue description

Until now, this has been an issue on the splunk backend given the fact that, in splunk, regexes need to be handled with an operator preceded by a pipe, like | regex fieldX=value. These pipes are making an implicit AND on the whole preceding query and are "ending the query", to continue it, we would need to add a | search which cannot be combined with parenthesis crossing/englobing pipes. So when regexes have only AND operators in their parents, this is not an issue and they can be appended at the end of the query with their implicit AND. But when they are concerned by an OR operator, as they are applying this implicit AND on the whole preceding query and "ending" it, there is no way to handle it with this method. So, something like (fieldX=test OR (index=* | regex fieldY=test) is returning an error (parenthesis englobing a pipe), as something like this : index=* | regex fieldX=test OR fieldY=test (query following the | regex).

Solution presentation

This PR offers to fix this limitation by computing all regexes at the begining, and then appending the query to handle value comparison conditions instead of regex condtions.
So for this PR offers to handle a rule like this :

sel1:
    fieldA|re:  foo.*bar
self2:
    fieldB|re: foo.*bar
condition: sel1 or sel2

with the following query :

| rex field=fieldA "(?<fieldAMatch>foo.*bar)"
| eval fieldACondition=if(isnotnull(fieldAMatch), "true", "false") 
| rex field=fieldB "(?<fieldBMatch>foo.*bar)" 
| eval fieldBCondition=if(isnotnull(fieldBMatch), "true", "false") 
| search fieldACondition="true" OR fieldBCondition="true"

Implementation

Implementation is passing through an new deferred class SplunkDeferredORRegularExpression, and the redefinition of finalize_query which is just checking if there is an ORing regex case, and if so, is calling the super().finalize_query with the query preceded by the whole train of | rex ...|eval ....

Cases of multiple regexes on the same field

When multiple regexes are on the same field, with an OR operator between them, i implemented the ability to add a number at the end of fieldXMatch and fieldXCondition to differentiate between them.

sel1:
    fieldA|re:  foo.*bar
self2:
    fieldA|re: foo.*foo
condition: sel1 or sel2

is handled with :

| rex field=fieldA "(?<fieldAMatch>foo.*bar)"
| eval fieldACondition=if(isnotnull(fieldAMatch), "true", "false") 
| rex field=fieldB "(?<fieldAMatch2>foo.*foo)" 
| eval fieldACondition2=if(isnotnull(fieldAMatch2), "true", "false") 
| search fieldACondition="true" OR fieldACondition2="true"

Notes on performance

This implementation offers to handle the rules using regexes with OR operators, but has a major drawback on the performance side : each regex will be processed on all logs before applying the query logic, this can take some time.

@thomaspatzke thomaspatzke merged commit fc6791f into SigmaHQ:main Apr 2, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants