Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill parent ID column and nested set columns #2487

Merged
merged 23 commits into from
Jun 1, 2023

Conversation

stoewer
Copy link
Contributor

@stoewer stoewer commented May 18, 2023

What this PR does:

Fill the new columns that were introduced with vParquet2 with the correct values.

Which issue(s) this PR fixes:
Fixes grafana/tempo-squad#207

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@stoewer
Copy link
Contributor Author

stoewer commented May 18, 2023

This adds some overhead to the traceToParquet() function, this is expected.
Here are results from the ProtoToParquet benchmark without and with the calculation for the nested set model:

goos: linux
goarch: amd64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet2
cpu: AMD Ryzen 7 PRO 6850U with Radeon Graphics     
                                  │ bench-nested-set-01.txt │       bench-nested-set-02.txt        │
                                  │         sec/op          │    sec/op      vs base               │
ProtoToParquet/SpanCount:10_k-16                8.746m ± 4%   10.763m ± 12%  +23.06% (p=0.002 n=6)
ProtoToParquet/SpanCount:100_k-16               58.16m ± 3%    69.99m ±  1%  +20.33% (p=0.002 n=6)
ProtoToParquet/SpanCount:1_M-16                 607.7m ± 2%    726.3m ±  2%  +19.52% (p=0.002 n=6)
geomean                                         67.61m         81.79m        +20.96%

                                  │ bench-nested-set-01.txt │      bench-nested-set-02.txt       │
                                  │          B/op           │     B/op      vs base              │
ProtoToParquet/SpanCount:10_k-16               8.338Mi ± 1%   8.641Mi ± 1%  +3.64% (p=0.002 n=6)
ProtoToParquet/SpanCount:100_k-16              82.80Mi ± 0%   86.69Mi ± 0%  +4.70% (p=0.002 n=6)
ProtoToParquet/SpanCount:1_M-16                827.4Mi ± 0%   867.1Mi ± 0%  +4.80% (p=0.002 n=6)
geomean                                        82.97Mi        86.60Mi       +4.38%

                                  │ bench-nested-set-01.txt │      bench-nested-set-02.txt      │
                                  │        allocs/op        │  allocs/op   vs base              │
ProtoToParquet/SpanCount:10_k-16                25.71k ± 1%   25.65k ± 1%       ~ (p=0.974 n=6)
ProtoToParquet/SpanCount:100_k-16               253.6k ± 0%   253.8k ± 0%       ~ (p=0.937 n=6)
ProtoToParquet/SpanCount:1_M-16                 2.535M ± 0%   2.533M ± 0%  -0.09% (p=0.041 n=6)
geomean                                         254.7k        254.5k       -0.08%

After optimizations:

goos: linux
goarch: amd64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet2
cpu: AMD Ryzen 7 PRO 6850U with Radeon Graphics     
                                  │ bench-nested-set-01b.txt │      bench-nested-set-02c.txt       │
                                  │          sec/op          │    sec/op     vs base               │
ProtoToParquet/SpanCount:10_k-16                9.586m ± 12%   11.024m ± 3%  +15.00% (p=0.002 n=6)
ProtoToParquet/SpanCount:100_k-16               57.88m ±  2%    68.60m ± 1%  +18.52% (p=0.002 n=6)
ProtoToParquet/SpanCount:1_M-16                 588.5m ±  3%    704.5m ± 4%  +19.72% (p=0.002 n=6)
geomean                                         68.86m          81.07m       +17.73%

                                  │ bench-nested-set-01b.txt │      bench-nested-set-02c.txt      │
                                  │           B/op           │     B/op      vs base              │
ProtoToParquet/SpanCount:10_k-16                8.307Mi ± 1%   8.651Mi ± 0%  +4.13% (p=0.002 n=6)
ProtoToParquet/SpanCount:100_k-16               82.78Mi ± 0%   86.66Mi ± 0%  +4.69% (p=0.002 n=6)
ProtoToParquet/SpanCount:1_M-16                 827.1Mi ± 0%   867.0Mi ± 0%  +4.82% (p=0.002 n=6)
geomean                                         82.85Mi        86.62Mi       +4.55%

                                  │ bench-nested-set-01b.txt │     bench-nested-set-02c.txt      │
                                  │        allocs/op         │  allocs/op   vs base              │
ProtoToParquet/SpanCount:10_k-16                 25.65k ± 2%   25.70k ± 2%       ~ (p=0.485 n=6)
ProtoToParquet/SpanCount:100_k-16                253.6k ± 0%   253.7k ± 0%       ~ (p=0.699 n=6)
ProtoToParquet/SpanCount:1_M-16                  2.534M ± 0%   2.534M ± 0%       ~ (p=0.818 n=6)
geomean                                          254.5k        254.7k       +0.06%

@stoewer stoewer added the enhancement New feature or request label May 18, 2023
@stoewer stoewer marked this pull request as ready for review May 18, 2023 09:42
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is some excellent work. after we merge this i will need to restrain myself from immediately writing {} >> {} :)

i have a fair number of questions and i'd like to give @mdisibio a chance to review (if he wants).

overall this looks great. i don't see any major issues.

tempodb/encoding/vparquet2/combiner.go Show resolved Hide resolved
tempodb/encoding/vparquet2/nested_set_model.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet2/nested_set_model.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet2/nested_set_model.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet2/nested_set_model.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet2/nested_set_model.go Outdated Show resolved Hide resolved
tempodb/encoding/vparquet2/nested_set_model.go Outdated Show resolved Hide resolved
@stoewer stoewer force-pushed the fill-nested-set-model-columns branch from 72b8ef5 to e81b1e9 Compare May 25, 2023 06:54
@stoewer stoewer force-pushed the fill-nested-set-model-columns branch from e81b1e9 to 62023e2 Compare May 25, 2023 07:09
Copy link
Contributor

@mdisibio mdisibio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The only concern was recalculating after combination which was just addressed. It achieves the goals I was hoping for: linear O(n) performance and memory, no recursion in the DFS by using the stack. At first glance it seems there are small tweaks that could simplify or optimize, but after further consideration they aren't so clearcut (like the child slice pre-allocation requires many more map lookups). Excellent job @stoewer !

@stoewer stoewer force-pushed the fill-nested-set-model-columns branch 3 times, most recently from 03caf64 to 62023e2 Compare May 26, 2023 06:15
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two questions, but this is looking good. maybe a quick sync discussion about the changes will be helpful to get this merged.

i am sad about the additional complexity to handle zipkin and (if the work weren't already complete) would be tempted to just say that hierarchical operators are undefined for zipkin traces.

great work. in particular i'm impressed with the clever optimizations in (span id, kind) -> hash

tempodb/encoding/vparquet2/nested_set_model.go Outdated Show resolved Hide resolved
@stoewer stoewer merged commit 890838b into grafana:main Jun 1, 2023
@stoewer stoewer deleted the fill-nested-set-model-columns branch June 1, 2023 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants