Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re2j runs 5x slower than java.util.regex #162

Open
YuyuZha0 opened this issue Nov 17, 2022 · 1 comment
Open

Re2j runs 5x slower than java.util.regex #162

YuyuZha0 opened this issue Nov 17, 2022 · 1 comment

Comments

@YuyuZha0
Copy link

Here is my benchmark:

import org.apache.commons.lang3.RandomStringUtils;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;

import java.util.Iterator;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

@Warmup(iterations = 2, time = 3)
@Measurement(iterations = 2, time = 3)
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Threads(3)
@Fork(1)
/*
 * <pre>
 *     Benchmark                Mode  Cnt     Score   Error  Units
 * Re2jBenchmark.javaMatch  avgt    2   175.926          ns/op
 * Re2jBenchmark.re2jMatch  avgt    2  1067.591          ns/op
 *     <pre/>
 */
public class Re2jBenchmark {

  private static final String PATTERN = "[0-5][0-6]{1,3}[1-7]{1,2}[2580]{1,3}[127]{0,8}";
  private static final Pattern p1 = Pattern.compile(PATTERN);
  private static final com.google.re2j.Pattern p2 = com.google.re2j.Pattern.compile(PATTERN);

  private List<String> inputs;

  private Iterator<String> iterator;

  @Setup
  public void setup() {
    inputs =
        IntStream.rangeClosed(0, 999)
            .mapToObj(n -> RandomStringUtils.randomNumeric(32))
            .collect(Collectors.toList());
    iterator = inputs.listIterator();
  }

  private String nextInput() {
    if (!iterator.hasNext()) {
      iterator = inputs.listIterator();
    }
    return iterator.next();
  }

  @Benchmark
  public boolean javaMatch() {
    return p1.matcher(nextInput()).find();
  }

  @Benchmark
  public boolean re2jMatch() {
    return p2.matcher(nextInput()).find();
  }
}

here is the result:

Benchmark                Mode  Cnt     Score   Error  Units
Re2jBenchmark.javaMatch  avgt    2   175.926          ns/op
Re2jBenchmark.re2jMatch  avgt    2  1067.591          ns/op
@RG9
Copy link

RG9 commented Aug 10, 2023

I get similar results (RE2 is 3x slower):

  Benchmark                                 Mode  Cnt       Score      Error  Units
  Re2Benchmark.wholeWordsEndingWithNn_jdk  thrpt    5       4.496 ±    0.292  ops/s
  Re2Benchmark.wholeWordsEndingWithNn_re2  thrpt    5       1.582 ±    0.124  ops/s

Note: I tested on sample (16MB book) from https:/rust-leipzig/regex-performance and pattern \b\w+nn\b for which RE2 excels in their benchmark.

On the other hand, RE2 is tremendously faster for regex with 100 alternations (first names) on the same sample, which is consistent with https:/google/re2j#why-should-i-switch:

  Benchmark                                 Mode  Cnt       Score      Error  Units
  Re2Benchmark.manyAlternation_jdk         thrpt    5       0.278 ±    0.002  ops/s
  Re2Benchmark.manyAlternations_re2        thrpt    5     111.601 ±    0.324  ops/s

Note: length of input text also matters

Code: RG9/rg-regex-demo@209d9b4#diff-fe662066dca41f899d163250973f0db8e950953eb9b03a7d08c71f5fdb6a0d70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants