-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed unescaped control character c causing jq parsing errors #1082
fixed unescaped control character c causing jq parsing errors #1082
Conversation
Evidence
|
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1082 +/- ##
==========================================
+ Coverage 82.20% 82.21% +0.01%
==========================================
Files 24 24
Lines 19900 19967 +67
==========================================
+ Hits 16358 16416 +58
- Misses 3542 3551 +9
☔ View full report in Codecov by Sentry. |
@hitenkoku Thanks so much! It will take a while to take benchmarks. I'll let you know when its done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YamatoSecurity @hitenkoku
I have a question about the fix!
After this fix, the control characters will be removed(not replace) . Is it OK that this is the expected behavior?
If the above behavior is OK with the specification, it is LGTM!🚀
@fukusuket I think it is better to leave them in and preserve the data. I thought |
@hitenkoku I figured it out. |
Here is the full list
|
@YamatoSecurity I apologize for taking so long. Would you please confirm that the correction has been completed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks so much!
@fukusuket Could you check this as well?
@hitenkoku @YamatoSecurity But the output speed seems to be slowing down as follows 😢 (I checked all-evtx.tgz(6.1GB)) main
This PR
I'm sorry, no good ideas right now, but is there a way to replace the control characters without slowing it down too much? 🤔 |
@fukusuket Thanks for taking the benchmarks. (I forgot to do that.. m(__)m ) |
Since it is currently iterating with I hope GPT-4 will give me the answer...w |
Here are my benchmarks. Not a huge difference but indeed a little slower. (12%?) Saved file: main.json (498.7 MB)
PR:
|
I'm really sorry.. 🙇 I was comparing to the #1090 branch, not the main branch. main
This PR
|
After #1089 is merged into main and this branch, I'll take benchmarks and compare this PR vs main. |
@YamatoSecurity @fukusuket |
main: this PR: Elapsed time is about 3% slower and memory usage is the same so I think it is good to merge. |
I took benchmarks comparing to 2.5.1 as well. Thanks to the speed improvements in the last PR, it is almost the same speed as 2.5.1 even with the control character replacement and checks to make sure fields gets parsed correctly. This PR
2.5.1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was good that the performance was about the same! LGTM!!🚀
Thanks for your review. I merge it. |
What Changed
I would appreciate it if you could review when you have time.