Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Reduced memory usage by removing unnecessary regex compilation #1047

Merged
merged 2 commits into from
May 14, 2023

Conversation

fukusuket
Copy link
Collaborator

@fukusuket fukusuket commented May 13, 2023

What Changed

Reduced memory usage by removing unnecessary regex compilations.
Fixed not to compile regex for string matching of the following value-modifiers as they do not currently require regex.

  • |contains
  • |contains|all

Additional informarion

Current hayabusa_rule/converter.py, the value-modifier below is internally converted to a wildcard pattern as follows.

  • key|contains: hoge -> key: *hoge*

In case above conversion pattern, Hayabusa will not compile the regex due to #894(So it uses less memory)

However, in the logsource_mapping.py currently under development, value-modifier is not converted and is output as is.
Therefore it uses more memory because Hayabusa compile regex.

This PR removes unnecessary regex compilation and reduces memory usage when rules are converted by a new conversion script (logsource_mapping.py).

I would appreciate it if you could review when you have time🙏

@fukusuket
Copy link
Collaborator Author

fukusuket commented May 13, 2023

Evidence

Environment

  • OS: macOS montery version 13.1
  • Hard: Macbook Air(M1, 2020) , Memory 8GB, Core 8

Test prerequisite

Use the converted rules in the logsource_mapping.py of the branch below.

Because this PR does not improve the speed of wildcard patterns without value-modifiers

Test1(hayabusa-sample-evtx)

main

fukusuke@fukusukenoMacBook-Air hayabusa-2.5.0-all-platforms % ./hayabusa-2.5.0-mac-arm csv-timeline -d ../hayabusa-sample-evtx --debug -D -u -n -o 3.csv -r ./converted_rules
...
Excluded rules: 30

Deprecated rules: 162 (4.46%)
Experimental rules: 1861 (51.22%)
Stable rules: 103 (2.84%)
Test rules: 1464 (40.30%)
Unsupported rules: 43 (1.18%)

Sigma rules: 3633
Total enabled detection rules: 3633
...
Events with hits / Total events: 5,714 / 47,465 (Data reduction: 41,751 events (87.96%))

Total | Unique detections: 8,054 | 564
Total | Unique critical detections: 53 (0.66%) | 21 (3.72%)
Total | Unique high detections: 5,472 (67.94%) | 280 (49.65%)
Total | Unique medium detections: 1,262 (15.67%) | 183 (32.45%)
Total | Unique low detections: 1,237 (15.36%) | 75 (13.30%)
Total | Unique informational detections: 30 (0.37%) | 5 (0.89%)
...
Elapsed time: 00:00:08.336
Rule Parse Processing Time: 00:00:01.429
Analysis Processing Time: 00:00:06.774
Output Processing Time: 00:00:00.131

Memory usage stats:
heap stats:     peak       total       freed     current        unit       count
  reserved:     2.0 GiB     2.0 GiB     0           2.0 GiB
 committed:     1.9 GiB     2.0 GiB     5.8 GiB    -3.8 GiB                          ok

This PR

fukusuke@fukusukenoMacBook-Air hayabusa-2.5.0-all-platforms % ./hayabusa csv-timeline -d ../hayabusa-sample-evtx --debug -D -u -n -o 5.csv -r ./converted_rules
...
Excluded rules: 30

Deprecated rules: 162 (4.46%)
Experimental rules: 1861 (51.22%)
Stable rules: 103 (2.84%)
Test rules: 1464 (40.30%)
Unsupported rules: 43 (1.18%)

Sigma rules: 3633
Total enabled detection rules: 3633
...
Events with hits / Total events: 5,714 / 47,465 (Data reduction: 41,751 events (87.96%))

Total | Unique detections: 8,054 | 564
Total | Unique critical detections: 53 (0.66%) | 21 (3.72%)
Total | Unique high detections: 5,472 (67.94%) | 280 (49.65%)
Total | Unique medium detections: 1,262 (15.67%) | 183 (32.45%)
Total | Unique low detections: 1,237 (15.36%) | 75 (13.30%)
Total | Unique informational detections: 30 (0.37%) | 5 (0.89%)
...
Elapsed time: 00:00:07.700
Rule Parse Processing Time: 00:00:00.942
Analysis Processing Time: 00:00:06.658
Output Processing Time: 00:00:00.099

Memory usage stats:
heap stats:     peak       total       freed     current        unit       count
  reserved:     1.0 GiB     1.0 GiB     0           1.0 GiB
 committed:     1.0 GiB     1.0 GiB     5.2 GiB    -4.2 GiB                          ok

Test2(all-evtx.tgz(6.1GB))

main

fukusuke@fukusukenoAir hayabusa-2.5.0-all-platforms % ./hayabusa-2.5.0-mac-arm csv-timeline -d ../all-evtx --debug -D -u -n -o 1.csv -r ./converted_rules
...
Excluded rules: 30

Deprecated rules: 162 (4.46%)
Experimental rules: 1861 (51.22%)
Stable rules: 103 (2.84%)
Test rules: 1464 (40.30%)
Unsupported rules: 43 (1.18%)

Sigma rules: 3633
Total enabled detection rules: 3633
...
Events with hits / Total events: 21,229 / 4,817,181 (Data reduction: 4,795,952 events (99.56%))

Total | Unique detections: 22,810 | 98
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 11,665 (51.14%) | 15 (15.31%)
Total | Unique medium detections: 4,641 (20.35%) | 33 (33.67%)
Total | Unique low detections: 5,253 (23.03%) | 41 (41.84%)
Total | Unique informational detections: 1,251 (5.48%) | 9 (9.18%)
...
Elapsed time: 00:06:23.844
Rule Parse Processing Time: 00:00:01.438
Analysis Processing Time: 00:06:22.069
Output Processing Time: 00:00:00.336

Memory usage stats:
heap stats:     peak       total       freed     current        unit       count
  reserved:     2.0 GiB     2.0 GiB     0           2.0 GiB
 committed:     1.9 GiB     2.0 GiB   207.7 GiB  -205.7 GiB                          ok

This PR

fukusuke@fukusukenoMacBook-Air hayabusa-2.5.0-all-platforms % ./hayabusa csv-timeline -d ../all-evtx --debug -D -u -n -o new.csv -r ./converted_rules
...
Excluded rules: 30

Deprecated rules: 162 (4.46%)
Experimental rules: 1861 (51.22%)
Stable rules: 103 (2.84%)
Test rules: 1464 (40.30%)
Unsupported rules: 43 (1.18%)

Sigma rules: 3633
Total enabled detection rules: 3633
...
Events with hits / Total events: 21,229 / 4,817,181 (Data reduction: 4,795,952 events (99.56%))

Total | Unique detections: 22,810 | 98
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 11,665 (51.14%) | 15 (15.31%)
Total | Unique medium detections: 4,641 (20.35%) | 33 (33.67%)
Total | Unique low detections: 5,253 (23.03%) | 41 (41.84%)
Total | Unique informational detections: 1,251 (5.48%) | 9 (9.18%)
...
Elapsed time: 00:06:17.504
Rule Parse Processing Time: 00:00:01.089
Analysis Processing Time: 00:06:16.212
Output Processing Time: 00:00:00.202

Memory usage stats:
heap stats:     peak       total       freed     current        unit       count
  reserved:     1.0 GiB     1.0 GiB     0           1.0 GiB
 committed:     1.0 GiB     1.0 GiB   210.2 GiB  -209.2 GiB                          ok

@fukusuket fukusuket self-assigned this May 13, 2023
@fukusuket fukusuket added the enhancement New feature or request label May 13, 2023
Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pull request.
I confirmed code.
LGTM.

@YamatoSecurity
Copy link
Collaborator

@fukusuket Thanks so much!
I think this should be good to merge but let me test on another machine just in case. I don't think the machine I am using now is good for benchmarks..

2.5.0 and current rules:
./hayabusa-2.5.0-mac-intel csv-timeline -d ../hayabusa-sample-evtx -n -D -u --debug -o xx.csv
1.9
2.9
4.9
2.9
1.9

(Sorry I only testing memory usage at first)

This branch and current rules:
./target/release/hayabusa csv-timeline -d ../hayabusa-sample-evtx -n -D -u --debug -o xx.csv
5.9
1.9
3.9
1.9
1.9

For some reason, the memory usage varies wildly with the current rules.

2.5.0 and new rules:
./hayabusa-2.5.0-mac-intel csv-timeline -d ../hayabusa-sample-evtx -n -D -u --debug -r ../hayabusa-rules/tools/sigmac/converted_rules -o xx.csv
1.9GB 15.8s
1.9GB 15.4s
1.9GB 15.3s
1.9GB 15.2s
1.9GB 15.3s

this branch and new rules:
./target/release/hayabusa csv-timeline -d ../hayabusa-sample-evtx -n -D -u --debug -r ../hayabusa-rules/tools/sigmac/converted_rules -o xx.csv
3.9GB 13.5s
4.9GB 13.6s
3.9GB 13.5s
4.9GB 13.7s
1.9GB 13.5s

This is also strange. 2.5.0 is stable and always returns the same memory usage.
This branch is faster but seems to use more memory on average on my intel mac. (?.?)
I will do more testing on an intel windows machine later.
This is also not a great benchmark as I'm only testing against the sample evtx files.

@fukusuket
Copy link
Collaborator Author

fukusuket commented May 13, 2023

@hitenkoku @YamatoSecurity Thank you so much for review and benchmarking 🙇

This branch implementation simply removed the unnecessary regex compilation process, so it's strange that there is a pattern of memory usage going up(I think that the effect will not change depending on the CPU architecture...🤔) I'll check on another Windows machine as well💪

@fukusuket
Copy link
Collaborator Author

fukusuket commented May 14, 2023

@YamatoSecurity
I got the benchmark in the following environments :)
Attached are the details of the benchmark results. benchmark-result.zip

Benchmark1

Environment

OS Memory CPU core CPU arch Hard info
Windows10 16GB 8 Intel(R) Core(TM) i7 mouse LAPTOP
macOS Ventura 8GB 8 Apple M1 MacBook Air

Data

Windows10

Ver Rule Elasped Time Memory(peak/reserved) Total/ Unique detections Result file size
2.5.0 old 00:11:21.300 1.0 GiB 22,810 / 98 10643851
This PR old 00:11:23.649 1.0 GiB 22,810 / 98 10643851
2.5.0 new 00:10:47.420 2.0 GiB 22,810 / 98 10643851
This PR new 00:10:48.114 1.0 GiB 22,810 / 98 10643851

macOS Ventura

Ver Rule Elasped Time Memory(peak/reserved) Total/ Unique detections Result file size
2.5.0 old 00:06:50.852 1.0 GiB 22,810 / 98 10643851
This PR old 00:06:52.355 1.0 GiB 22,810 / 98 10643851
2.5.0 new 00:06:15.362 2.0 GiB 22,810 / 98 10643851
This PR new 00:06:25.420 1.0 GiB 22,810 / 98 10643851

Sorry for the lack of explanation🙇 This PR can reduce memory usage, but I don't think it will improve speed.(Even if the speed improves, I think it will be a few seconds...)

The speed improvement is mainly due to new rules which was converted by logsource_mapping.py.

@fukusuket
Copy link
Collaborator Author

fukusuket commented May 14, 2023

I also got the benchmark below. Attached is the benchmark result. :)
benchmark-result2.zip

Benchmark2

Data

Windows10

Ver Rule Elasped Time Memory(peak/reserved) Total/ Unique detections Result file size
2.5.0 old 00:00:13.155 1.0 GiB 8,035 / 559 4378169
This PR old 00:00:12.887 1.0 GiB 8,035 / 559 4378177
2.5.0 new 00:00:15.686 2.0 GiB 8,054 / 564 4388796
This PR new 00:00:14.399 1.0 GiB 8,054 / 564 4388804

macOS Ventura

Ver Rule Elasped Time Memory(peak/reserved) Total/ Unique detections Result file size
2.5.0 old 00:00:07.518 1.0 GiB 8,035 / 559 4378169
This PR old 00:00:07.445 1.0 GiB 8,035 / 559 4378177
2.5.0 new 00:00:08.542 2.0 GiB 8,054 / 564 4388796
This PR new 00:00:07.894 1.0 GiB 8,054 / 564 4388804

Copy link
Collaborator

@YamatoSecurity YamatoSecurity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, LGTM!
Let's merge it. I updated the changelogs.
Thanks so much!

@codecov
Copy link

codecov bot commented May 14, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (2cc1208) 73.81% compared to head (5bdcdcd) 73.81%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1047   +/-   ##
=======================================
  Coverage   73.81%   73.81%           
=======================================
  Files          24       24           
  Lines       18005    18006    +1     
=======================================
+ Hits        13291    13292    +1     
  Misses       4714     4714           
Impacted Files Coverage Δ
src/detections/rule/matchers.rs 96.71% <100.00%> (+<0.01%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hitenkoku hitenkoku merged commit 826d6a1 into main May 14, 2023
@fukusuket fukusuket deleted the improve-speed-by-reducing-regex-compile branch May 14, 2023 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants