-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Reduced memory usage by removing unnecessary regex compilation #1047
Conversation
EvidenceEnvironment
Test prerequisiteUse the converted rules in the Because this PR does not improve the speed of Test1(hayabusa-sample-evtx)main
This PR
Test2(all-evtx.tgz(6.1GB))main
This PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pull request.
I confirmed code.
LGTM.
@fukusuket Thanks so much! 2.5.0 and current rules: (Sorry I only testing memory usage at first) This branch and current rules: For some reason, the memory usage varies wildly with the current rules. 2.5.0 and new rules: this branch and new rules: This is also strange. 2.5.0 is stable and always returns the same memory usage. |
@hitenkoku @YamatoSecurity Thank you so much for review and benchmarking 🙇 This branch implementation simply removed the unnecessary regex compilation process, so it's strange that there is a pattern of memory usage going up(I think that the effect will not change depending on the CPU architecture...🤔) I'll check on another Windows machine as well💪 |
@YamatoSecurity Benchmark1Environment
DataWindows10
macOS Ventura
Sorry for the lack of explanation🙇 This PR can reduce memory usage, but I don't think it will improve speed.(Even if the speed improves, I think it will be a few seconds...) The speed improvement is mainly due to new rules which was converted by |
I also got the benchmark below. Attached is the benchmark result. :) Benchmark2DataWindows10
macOS Ventura
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, LGTM!
Let's merge it. I updated the changelogs.
Thanks so much!
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1047 +/- ##
=======================================
Coverage 73.81% 73.81%
=======================================
Files 24 24
Lines 18005 18006 +1
=======================================
+ Hits 13291 13292 +1
Misses 4714 4714
☔ View full report in Codecov by Sentry. |
What Changed
Reduced memory usage by removing unnecessary regex compilations.
Fixed not to compile regex for string matching of the following
value-modifiers
as they do not currently require regex.|contains
|contains|all
Additional informarion
Current hayabusa_rule/converter.py, the
value-modifier
below is internally converted to awildcard pattern
as follows.key|contains: hoge
->key: *hoge*
In case above conversion pattern, Hayabusa will not compile the regex due to #894(So it uses less memory)
However, in the logsource_mapping.py currently under development,
value-modifier
is not converted and is output as is.Therefore it uses more memory because Hayabusa compile regex.
This PR removes unnecessary regex compilation and reduces memory usage when rules are converted by a new conversion script (
logsource_mapping.py
).I would appreciate it if you could review when you have time🙏