Fill parent ID column and nested set columns #2487

stoewer · 2023-05-18T00:31:47Z

What this PR does:

Fill the new columns that were introduced with vParquet2 with the correct values.

Which issue(s) this PR fixes:
Fixes grafana/tempo-squad#207

Checklist

Tests updated
~~Documentation added~~
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

stoewer · 2023-05-18T01:54:14Z

This adds some overhead to the traceToParquet() function, this is expected.
Here are results from the ProtoToParquet benchmark without and with the calculation for the nested set model:

goos: linux
goarch: amd64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet2
cpu: AMD Ryzen 7 PRO 6850U with Radeon Graphics     
                                  │ bench-nested-set-01.txt │       bench-nested-set-02.txt        │
                                  │         sec/op          │    sec/op      vs base               │
ProtoToParquet/SpanCount:10_k-16                8.746m ± 4%   10.763m ± 12%  +23.06% (p=0.002 n=6)
ProtoToParquet/SpanCount:100_k-16               58.16m ± 3%    69.99m ±  1%  +20.33% (p=0.002 n=6)
ProtoToParquet/SpanCount:1_M-16                 607.7m ± 2%    726.3m ±  2%  +19.52% (p=0.002 n=6)
geomean                                         67.61m         81.79m        +20.96%

                                  │ bench-nested-set-01.txt │      bench-nested-set-02.txt       │
                                  │          B/op           │     B/op      vs base              │
ProtoToParquet/SpanCount:10_k-16               8.338Mi ± 1%   8.641Mi ± 1%  +3.64% (p=0.002 n=6)
ProtoToParquet/SpanCount:100_k-16              82.80Mi ± 0%   86.69Mi ± 0%  +4.70% (p=0.002 n=6)
ProtoToParquet/SpanCount:1_M-16                827.4Mi ± 0%   867.1Mi ± 0%  +4.80% (p=0.002 n=6)
geomean                                        82.97Mi        86.60Mi       +4.38%

                                  │ bench-nested-set-01.txt │      bench-nested-set-02.txt      │
                                  │        allocs/op        │  allocs/op   vs base              │
ProtoToParquet/SpanCount:10_k-16                25.71k ± 1%   25.65k ± 1%       ~ (p=0.974 n=6)
ProtoToParquet/SpanCount:100_k-16               253.6k ± 0%   253.8k ± 0%       ~ (p=0.937 n=6)
ProtoToParquet/SpanCount:1_M-16                 2.535M ± 0%   2.533M ± 0%  -0.09% (p=0.041 n=6)
geomean                                         254.7k        254.5k       -0.08%

After optimizations:

goos: linux
goarch: amd64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet2
cpu: AMD Ryzen 7 PRO 6850U with Radeon Graphics     
                                  │ bench-nested-set-01b.txt │      bench-nested-set-02c.txt       │
                                  │          sec/op          │    sec/op     vs base               │
ProtoToParquet/SpanCount:10_k-16                9.586m ± 12%   11.024m ± 3%  +15.00% (p=0.002 n=6)
ProtoToParquet/SpanCount:100_k-16               57.88m ±  2%    68.60m ± 1%  +18.52% (p=0.002 n=6)
ProtoToParquet/SpanCount:1_M-16                 588.5m ±  3%    704.5m ± 4%  +19.72% (p=0.002 n=6)
geomean                                         68.86m          81.07m       +17.73%

                                  │ bench-nested-set-01b.txt │      bench-nested-set-02c.txt      │
                                  │           B/op           │     B/op      vs base              │
ProtoToParquet/SpanCount:10_k-16                8.307Mi ± 1%   8.651Mi ± 0%  +4.13% (p=0.002 n=6)
ProtoToParquet/SpanCount:100_k-16               82.78Mi ± 0%   86.66Mi ± 0%  +4.69% (p=0.002 n=6)
ProtoToParquet/SpanCount:1_M-16                 827.1Mi ± 0%   867.0Mi ± 0%  +4.82% (p=0.002 n=6)
geomean                                         82.85Mi        86.62Mi       +4.55%

                                  │ bench-nested-set-01b.txt │     bench-nested-set-02c.txt      │
                                  │        allocs/op         │  allocs/op   vs base              │
ProtoToParquet/SpanCount:10_k-16                 25.65k ± 2%   25.70k ± 2%       ~ (p=0.485 n=6)
ProtoToParquet/SpanCount:100_k-16                253.6k ± 0%   253.7k ± 0%       ~ (p=0.699 n=6)
ProtoToParquet/SpanCount:1_M-16                  2.534M ± 0%   2.534M ± 0%       ~ (p=0.818 n=6)
geomean                                          254.5k        254.7k       +0.06%

joe-elliott

this is some excellent work. after we merge this i will need to restrain myself from immediately writing {} >> {} :)

i have a fair number of questions and i'd like to give @mdisibio a chance to review (if he wants).

overall this looks great. i don't see any major issues.

tempodb/encoding/vparquet2/combiner.go

tempodb/encoding/vparquet2/nested_set_model.go

mdisibio

LGTM. The only concern was recalculating after combination which was just addressed. It achieves the goals I was hoping for: linear O(n) performance and memory, no recursion in the DFS by using the stack. At first glance it seems there are small tweaks that could simplify or optimize, but after further consideration they aren't so clearcut (like the child slice pre-allocation requires many more map lookups). Excellent job @stoewer !

…iner

joe-elliott

two questions, but this is looking good. maybe a quick sync discussion about the changes will be helpful to get this merged.

i am sad about the additional complexity to handle zipkin and (if the work weren't already complete) would be tempted to just say that hierarchical operators are undefined for zipkin traces.

great work. in particular i'm impressed with the clever optimizations in (span id, kind) -> hash

tempodb/encoding/vparquet2/nested_set_model.go

stoewer added 6 commits May 18, 2023 10:25

Add helper function to convert span IDs into uint64

9a0a803

Add function to assign nested set model bounds to traces

eab5c68

Apply nested set model bounds to created and combined traces

0da8134

Unit test for traceToParquet function

e42060c

Add PR to CHANGELOG

964424f

Fix liter issues

35183e2

stoewer added the enhancement New feature or request label May 18, 2023

stoewer added 2 commits May 18, 2023 16:28

Small optimizations

54c2796

Use [8]byte array instead of uint64 for IDs

df43352

stoewer marked this pull request as ready for review May 18, 2023 09:42

stoewer requested review from joe-elliott, annanay25, mdisibio, mapno, kvrhdn, zalegrala, electron0zero and ie-pham as code owners May 18, 2023 09:42

Merge branch 'main' into fill-nested-set-model-columns

4c9aece

joe-elliott reviewed May 24, 2023

View reviewed changes

stoewer added 4 commits May 25, 2023 13:53

Force nested set bounds reassignment for combined traces

4252752

Merge branch 'main' into fill-nested-set-model-columns

c6e3197

Safe method to check whether a span is a root span

2807b39

Add comments to explain algorithm

af7e6b6

stoewer force-pushed the fill-nested-set-model-columns branch from 72b8ef5 to e81b1e9 Compare May 25, 2023 06:54

Make stack implementation more generic and move it to pkg/util

62023e2

stoewer force-pushed the fill-nested-set-model-columns branch from e81b1e9 to 62023e2 Compare May 25, 2023 07:09

mdisibio approved these changes May 25, 2023

View reviewed changes

joe-elliott approved these changes May 25, 2023

View reviewed changes

stoewer force-pushed the fill-nested-set-model-columns branch 3 times, most recently from 03caf64 to 62023e2 Compare May 26, 2023 06:15

stoewer added 8 commits May 29, 2023 19:07

Change the assignment algorithm to account for zipkin spans

b063eb7

Remove unuse stack implementation

1350a0a

Use the same span ID to token function for nested set values and comb…

10f68ac

…iner

Improve unit tests

2891818

Improve comments

6e0ba8f

Reformat code

2d971fd

Remove unused parameter from unit test

ac38e8f

Merge branch 'main' into fill-nested-set-model-columns

3a6834d

joe-elliott reviewed May 31, 2023

View reviewed changes

tempodb/encoding/vparquet2/nested_set_model.go Show resolved Hide resolved

tempodb/encoding/vparquet2/nested_set_model.go Outdated Show resolved Hide resolved

Remove check to skip the assignmnt (and the force assignment parameter)

5e5ae4e

stoewer merged commit 890838b into grafana:main Jun 1, 2023

stoewer deleted the fill-nested-set-model-columns branch June 1, 2023 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill parent ID column and nested set columns #2487

Fill parent ID column and nested set columns #2487

stoewer commented May 18, 2023 •

edited

Loading

stoewer commented May 18, 2023 •

edited

Loading

joe-elliott left a comment

mdisibio left a comment

joe-elliott left a comment

Fill parent ID column and nested set columns #2487

Fill parent ID column and nested set columns #2487

Conversation

stoewer commented May 18, 2023 • edited Loading

stoewer commented May 18, 2023 • edited Loading

joe-elliott left a comment

Choose a reason for hiding this comment

mdisibio left a comment

Choose a reason for hiding this comment

joe-elliott left a comment

Choose a reason for hiding this comment

stoewer commented May 18, 2023 •

edited

Loading

stoewer commented May 18, 2023 •

edited

Loading