Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find without html tree for the remaining combinators #545

Merged

Conversation

ypconstante
Copy link
Contributor

@ypconstante ypconstante commented Feb 22, 2024

This PR adds support for the remaining combinator match types in traverse_html_tuples.

To avoid making this a lot more complex on child and adjacent sibling, this optimized traversal will happen only if these combinators are the last combinator in the selector. Without this limitation we'd need to keep track of a lot of information to not duplicate nodes in the response.

This also adds a small documentation to traverse_html_tuples explaining how we are keeping track of the multiple combinators types.


##### With input big #####
Name                                         ips        average  deviation         median         99th %
tag name (type) (main)                    545.35        1.83 ms    ±21.21%        1.73 ms        3.37 ms
tag name (type) (pr)                      531.10        1.88 ms    ±13.70%        1.82 ms        3.19 ms
descendant combinator (main)              509.60        1.96 ms     ±9.57%        1.92 ms        2.62 ms
descendant combinator (pr)                500.25        2.00 ms    ±11.14%        1.95 ms        2.87 ms
child combinator (pr)                     495.81        2.02 ms    ±11.28%        1.96 ms        2.91 ms
adjacent_sibling combinator (pr)          472.45        2.12 ms    ±10.54%        2.06 ms        3.04 ms
general_sibling combinator (pr)           460.69        2.17 ms    ±11.86%        2.11 ms        2.92 ms
general_sibling combinator (main)          58.75       17.02 ms    ±20.34%       16.19 ms       27.38 ms
child combinator (main)                    58.70       17.04 ms    ±22.14%       16.06 ms       27.61 ms
adjacent_sibling combinator (main)         48.13       20.78 ms    ±23.40%       19.62 ms       36.36 ms

Comparison: 
tag name (type) (main)                    545.35
tag name (type) (pr)                      531.10 - 1.03x slower +0.0492 ms
descendant combinator (main)              509.60 - 1.07x slower +0.129 ms
descendant combinator (pr)                500.25 - 1.09x slower +0.165 ms
child combinator (pr)                     495.81 - 1.10x slower +0.183 ms
adjacent_sibling combinator (pr)          472.45 - 1.15x slower +0.28 ms
general_sibling combinator (pr)           460.69 - 1.18x slower +0.34 ms
general_sibling combinator (main)          58.75 - 9.28x slower +15.19 ms
child combinator (main)                    58.70 - 9.29x slower +15.20 ms
adjacent_sibling combinator (main)         48.13 - 11.33x slower +18.94 ms

Memory usage statistics:

Name                                  Memory usage
tag name (type) (main)                     1.99 MB
tag name (type) (pr)                       1.99 MB - 1.00x memory usage +0 MB
descendant combinator (main)               2.19 MB - 1.10x memory usage +0.21 MB
descendant combinator (pr)                 2.19 MB - 1.10x memory usage +0.21 MB
child combinator (pr)                      2.00 MB - 1.01x memory usage +0.0120 MB
adjacent_sibling combinator (pr)           1.99 MB - 1.00x memory usage +0.00234 MB
general_sibling combinator (pr)            2.38 MB - 1.20x memory usage +0.39 MB
general_sibling combinator (main)          9.60 MB - 4.83x memory usage +7.61 MB
child combinator (main)                    9.56 MB - 4.80x memory usage +7.57 MB
adjacent_sibling combinator (main)         9.59 MB - 4.82x memory usage +7.60 MB

**All measurements for memory usage were the same**

##### With input medium #####
Name                                         ips        average  deviation         median         99th %
tag name (type) (main)                    1.69 K      592.54 μs    ±10.25%      568.36 μs      839.59 μs
tag name (type) (pr)                      1.64 K      611.53 μs    ±13.60%      577.67 μs      939.22 μs
descendant combinator (main)              1.59 K      627.45 μs    ±10.33%      600.12 μs      887.73 μs
descendant combinator (pr)                1.52 K      656.29 μs    ±16.62%      629.52 μs     1085.47 μs
general_sibling combinator (pr)           1.48 K      676.78 μs    ±10.76%      654.32 μs      961.10 μs
child combinator (pr)                     1.44 K      693.15 μs    ±11.60%      671.66 μs     1005.27 μs
adjacent_sibling combinator (pr)          1.42 K      704.37 μs    ±11.16%      686.11 μs     1033.76 μs
child combinator (main)                   0.24 K     4149.36 μs    ±21.10%     4011.47 μs     6494.70 μs
adjacent_sibling combinator (main)        0.22 K     4507.47 μs    ±20.24%     4488.29 μs     7412.67 μs
general_sibling combinator (main)        0.200 K     5001.64 μs    ±16.40%     4979.82 μs     7465.99 μs

Comparison: 
tag name (type) (main)                    1.69 K
tag name (type) (pr)                      1.64 K - 1.03x slower +18.99 μs
descendant combinator (main)              1.59 K - 1.06x slower +34.90 μs
descendant combinator (pr)                1.52 K - 1.11x slower +63.75 μs
general_sibling combinator (pr)           1.48 K - 1.14x slower +84.24 μs
child combinator (pr)                     1.44 K - 1.17x slower +100.61 μs
adjacent_sibling combinator (pr)          1.42 K - 1.19x slower +111.83 μs
child combinator (main)                   0.24 K - 7.00x slower +3556.82 μs
adjacent_sibling combinator (main)        0.22 K - 7.61x slower +3914.93 μs
general_sibling combinator (main)        0.200 K - 8.44x slower +4409.10 μs

Memory usage statistics:

Name                                  Memory usage
tag name (type) (main)                   687.48 KB
tag name (type) (pr)                     687.48 KB - 1.00x memory usage +0 KB
descendant combinator (main)             772.73 KB - 1.12x memory usage +85.25 KB
descendant combinator (pr)               772.73 KB - 1.12x memory usage +85.25 KB
general_sibling combinator (pr)          828.85 KB - 1.21x memory usage +141.37 KB
child combinator (pr)                    717.84 KB - 1.04x memory usage +30.36 KB
adjacent_sibling combinator (pr)         691.81 KB - 1.01x memory usage +4.33 KB
child combinator (main)                 2998.63 KB - 4.36x memory usage +2311.15 KB
adjacent_sibling combinator (main)      3091.72 KB - 4.50x memory usage +2404.23 KB
general_sibling combinator (main)       3330.14 KB - 4.84x memory usage +2642.66 KB

**All measurements for memory usage were the same**

##### With input small #####
Name                                         ips        average  deviation         median         99th %
tag name (type) (main)                    8.12 K      123.11 μs    ±17.48%      118.14 μs      208.02 μs
descendant combinator (main)              7.68 K      130.27 μs    ±15.65%      122.44 μs      212.88 μs
tag name (type) (pr)                      7.56 K      132.24 μs    ±19.67%      122.23 μs      249.39 μs
descendant combinator (pr)                7.47 K      133.93 μs    ±16.61%      125.46 μs      226.97 μs
general_sibling combinator (pr)           7.04 K      141.98 μs    ±16.32%      132.97 μs      238.54 μs
adjacent_sibling combinator (pr)          6.57 K      152.29 μs    ±18.28%      142.12 μs      266.85 μs
child combinator (pr)                     6.36 K      157.22 μs    ±15.38%      148.09 μs      263.97 μs
child combinator (main)                   1.69 K      591.36 μs    ±10.33%      564.78 μs      839.66 μs
adjacent_sibling combinator (main)        1.53 K      655.21 μs    ±22.65%      623.29 μs     1330.19 μs
general_sibling combinator (main)         1.45 K      688.48 μs     ±9.23%      663.36 μs      972.10 μs

Comparison: 
tag name (type) (main)                    8.12 K
descendant combinator (main)              7.68 K - 1.06x slower +7.17 μs
tag name (type) (pr)                      7.56 K - 1.07x slower +9.13 μs
descendant combinator (pr)                7.47 K - 1.09x slower +10.83 μs
general_sibling combinator (pr)           7.04 K - 1.15x slower +18.87 μs
adjacent_sibling combinator (pr)          6.57 K - 1.24x slower +29.18 μs
child combinator (pr)                     6.36 K - 1.28x slower +34.12 μs
child combinator (main)                   1.69 K - 4.80x slower +468.25 μs
adjacent_sibling combinator (main)        1.53 K - 5.32x slower +532.10 μs
general_sibling combinator (main)         1.45 K - 5.59x slower +565.38 μs
read_file = fn name ->
  __ENV__.file
  |> Path.dirname()
  |> Path.join(name)
  |> File.read!()
  |> Floki.parse_document!()
end

inputs = %{
  "big" => read_file.("big.html"),
  "medium" => read_file.("medium.html"),
  "small" => read_file.("small.html")
}

Benchee.run(
  %{
    "tag name (type)" => fn doc -> Floki.find(doc, "div") end,
    "descendant combinator" => fn doc -> Floki.find(doc, "div p") end,
    "child combinator" => fn doc -> Floki.find(doc, "div > p") end,
    "adjacent_sibling combinator" => fn doc -> Floki.find(doc, "div + a") end,
    "general_sibling combinator" => fn doc -> Floki.find(doc, "div ~ a") end,
  },
  time: 10,
  inputs: inputs,
  save: [path: "benchs/results/finder-#{tag}.benchee", tag: tag],
  memory_time: 2
)

Copy link
Owner

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got the idea. Really great, @ypconstante !! 🚀

@philss philss merged commit c479644 into philss:main Mar 1, 2024
9 checks passed
@ypconstante ypconstante deleted the find--without-html-tree-remaining-combinators branch July 24, 2024 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants