forked from intel/hyperscan
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior in aarch64 #100
Labels
bug
Something isn't working
Comments
After #93 I can't reproduce the problem on aarch64 |
Closed with #102 |
@markos, could I please kindly ask, |
@vmurashev I'm working on fixing #95 as well for 5.4.7, if that does not happen soon, expect 5.4.7 on Monday. :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
First, thank you for taking the time to make arm support possible :)
Second, I have found a case where vectorscan reports a false positive match on ARM aarch64. The same input does not produce a false positive in the original hyperscan on x64.
I have isolated a very small reproducible example with 2 input regexes and a couple of bytes of corpus text. The text that is scanned is:
xxxxxxxxxx?y\nTEXT12345xxxxxxxxxxxx
whereas the two regexes are:
The single match is reported as follows:
1
23
y\\z*TEXT12345
xxxxxxxxxx?y\nTEXT12345
As far as I know, this should not match.
What I think could help is that the two regexes only produce a match if compiled without the flag
HS_FLAG_SOM_LEFTMOST
(this is why I only report the ending position of the match). For example, in my tests I was using flagsHS_FLAG_DOTALL | HS_FLAG_MULTILINE
, but the moment you includeHS_FLAG_SOM_LEFTMOST
, the match is no longer falsely reported.Furthermore, if I remove e.g. one or more
x
chars from the end of the input string (even though these are not matched), then the match is no longer reported. Same with thex
chars at the beginning. I know this is a strange example but it comes from a much larger dataset of inputs and this is the smallest I could pinpoint. Also note that if compiling the regexes individually, none of them produce matches.The self-contained code of the example (notice the multiple backslashes for the escaping character
\\\\
):I compiled with
g++-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
on x64 andgcc10-g++ (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)
on aarch64. Ragel version isRagel State Machine Compiler version 6.10 March 2017
for both.I noticed there was also a recent post with a similar problem here and that maybe this PR fixes the problem. I can try rerunning the test when the PR is merged.
Let me know if there is anything else I can provide. Thank you for your time.
The text was updated successfully, but these errors were encountered: