Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescaped control character ^C causing jq parsing errors #1068

Closed
YamatoSecurity opened this issue Jun 1, 2023 · 1 comment · Fixed by #1082
Closed

Unescaped control character ^C causing jq parsing errors #1068

YamatoSecurity opened this issue Jun 1, 2023 · 1 comment · Fixed by #1082
Assignees
Labels
invalid This doesn't seem right
Milestone

Comments

@YamatoSecurity
Copy link
Collaborator

When there is a control character like ^C in log data, it will cause jq to give the error: parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 10898, column 24.
We may need to escape other control characters besides ^C according to this chart: https://www.techonthenet.com/unicode/chart.php

@hitenkoku Can you look at this? I will DM you the .evtx file that is causing this problem.
To reproduce:

./target/release/hayabusa json-timeline -f event.evtx -o badchar.json
cat badchar.json | jq
@YamatoSecurity YamatoSecurity added this to the v2.6.0 milestone Jun 1, 2023
@hitenkoku hitenkoku self-assigned this Jun 1, 2023
@hitenkoku hitenkoku added the invalid This doesn't seem right label Jun 1, 2023
hitenkoku added a commit that referenced this issue Jun 4, 2023
@YamatoSecurity
Copy link
Collaborator Author

Here is the list of control characters that need to be converted only when json-timeline is used as it will cause JSON parsing errors.

^@ (NUL) -> \u0000
^A (SOH) -> \u0001
^B (STX) -> \u0002
^C (ETX) -> \u0003
^D (EOT) -> \u0004
^E (ENQ) -> \u0005
^F (ACK) -> \u0006
^G (BEL) -> \u0007
^H (BS) -> \u0008
^I (TAB) -> \u0009
^J (LF) -> \u000A
^K (VT) -> \u000B
^L (FF) -> \u000C
^M (CR) -> \u000D
^N (SO) -> \u000E
^O (SI) -> \u000F
^P (DLE) -> \u0010
^Q (DC1) -> \u0011
^R (DC2) -> \u0012
^S (DC3) -> \u0013
^T (DC4) -> \u0014
^U (NAK) -> \u0015
^V (SYN) -> \u0016
^W (ETB) -> \u0017
^X (CAN) -> \u0018
^Y (EM) -> \u0019
^Z (SUB) -> \u001A
^[ (ESC) -> \u001B
^\ (FS) -> \u001C
^] (GS) -> \u001D
^^ (RS) -> \u001E
^_ (US) -> \u001F

I am not sure if this will cause other problems though when importing JSON into a SIEM, etc.. so we will have to see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants