Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(config): make arrangement backfill default #14846

Merged
merged 19 commits into from
Mar 27, 2024

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Jan 29, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR makes arrangement backfill the default backfill by changing the streaming_use_arrangement_backfill to true by default.

We also fix some ci issues which block this from being the default.

Changes

Large number of changes are due to ./risedev dapt changing the planner tests, since now ArrangementBackfill will be the scan type instead of Backfill.

In this PR we also hide some compactor table_ids, since these will take up a lot of logging space and are not really useful in debugging CI.

Here are the regression in PR runtimes:

  • end-to-end test 19min -> 21min
  • end-to-end test parallel 10min -> 12min
  • end-to-end test deterministic sim 21min -> 25min (Could be just regression from other pipelines..)

It's fine IMO since it's just debug builds. More importantly main-cron does not show regression for e2e test release or backfill tests.

Here are the changes to main-cron runtimes (see #14846 (comment) for details):

  1. Backfill test seem to improve.
  2. e2e release test seem to improve.
  3. Recovery test runtimes seem to slightly increase. Perhaps due to the following overheads:
    1. Instead of just using storage table, we now use ReplicatedStateTable. Metadata is larger since we need the full TableCatalog, instead of just TableDesc.
    2. On recovery, we need to read backfill state and recover the vnode level state.

Performance comparisons between this and no shuffle backfill:

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

Arrangement Backfill decouples upstream stream jobs with their downstream counter parts.

Consider the following:

create materialized view m1 as select * from t;
create materialized view m2 as select * from m1;

Before this PR, we used NoShuffleBackfill to merge historical data and the update stream being scanned from m1 into m2. The implementation of NoShuffleBackfill is such that the parallelism of m1 and m2 are coupled together.
Such that if m2 scales, m1 also needs to be scaled.

With ArrangementBackfill, we decouple this behaviour, and so m2 can scale independently of m1.

Note that Arrangement Backfill is enabled by default. You can disable it with:

set streaming_use_arrangement_backfill to false;

@kwannoel
Copy link
Contributor Author

kwannoel commented Jan 29, 2024

Need to figure out:

  1. Why e2e test will TLE. (Perhaps it's the get_row each time we persist state in debug mode).
  2. Why deterministic test test_high_barrier_latency_cancel takes a long time.
  3. Why recovery test fails.

@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch from e156e36 to 18d0f05 Compare January 30, 2024 08:30
@kwannoel
Copy link
Contributor Author

kwannoel commented Jan 30, 2024

E2e test Runtime comparison (debug mode)

key: e2e_test/streaming/aggregate/hdr_approx_percentile.slt, diff: +1409, arrangement runtime: 10965, no shuffle runtime: 9556

key: e2e_test/streaming/./nexmark/create_views.slt.part, diff: +4248, arrangement runtime: 19953, no shuffle runtime: 15705

key: e2e_test/streaming/nexmark_snapshot.slt, diff: +5114, arrangement runtime: 22997, no shuffle runtime: 17883

key: e2e_test/streaming/nexmark_upstream.slt, diff: +5226, arrangement runtime: 23449, no shuffle runtime: 18223

key: e2e_test/streaming/over_window/main.slt, diff: +2053, arrangement runtime: 49867, no shuffle runtime: 47814

key: e2e_test/streaming/./tpch/./views/q5.slt.part, diff: +1287, arrangement runtime: 2045, no shuffle runtime: 758

key: e2e_test/streaming/./tpch/./views/q7.slt.part, diff: +1782, arrangement runtime: 2563, no shuffle runtime: 781

key: e2e_test/streaming/./tpch/./views/q8.slt.part, diff: +2189, arrangement runtime: 3556, no shuffle runtime: 1367

key: e2e_test/streaming/./tpch/./views/q9.slt.part, diff: +1991, arrangement runtime: 3312, no shuffle runtime: 1321

key: e2e_test/streaming/./tpch/./views/q10.slt.part, diff: +1858, arrangement runtime: 2875, no shuffle runtime: 1017

key: e2e_test/streaming/./tpch/./views/q11.slt.part, diff: +1844, arrangement runtime: 2828, no shuffle runtime: 984

key: e2e_test/streaming/./tpch/./views/q12.slt.part, diff: +1585, arrangement runtime: 2285, no shuffle runtime: 700

key: e2e_test/streaming/./tpch/./views/q13.slt.part, diff: +1560, arrangement runtime: 2252, no shuffle runtime: 692

key: e2e_test/streaming/./tpch/./views/q14.slt.part, diff: +1561, arrangement runtime: 2263, no shuffle runtime: 702

key: e2e_test/streaming/./tpch/./views/q15.slt.part, diff: +1864, arrangement runtime: 2614, no shuffle runtime: 750

key: e2e_test/streaming/./tpch/./views/q16.slt.part, diff: +2220, arrangement runtime: 3210, no shuffle runtime: 990

key: e2e_test/streaming/./tpch/./views/q17.slt.part, diff: +2165, arrangement runtime: 3464, no shuffle runtime: 1299

key: e2e_test/streaming/./tpch/./views/q18.slt.part, diff: +2399, arrangement runtime: 3713, no shuffle runtime: 1314

key: e2e_test/streaming/./tpch/./views/q19.slt.part, diff: +2096, arrangement runtime: 3086, no shuffle runtime: 990

key: e2e_test/streaming/./tpch/./views/q20.slt.part, diff: +3577, arrangement runtime: 5405, no shuffle runtime: 1828

key: e2e_test/streaming/./tpch/./views/q21.slt.part, diff: +3933, arrangement runtime: 5887, no shuffle runtime: 1954

key: e2e_test/streaming/./tpch/./views/q22.slt.part, diff: +2721, arrangement runtime: 4033, no shuffle runtime: 1312

key: e2e_test/streaming/./tpch/create_views.slt.part, diff: +38380, arrangement runtime: 60467, no shuffle runtime: 22087

key: e2e_test/streaming/./tpch/drop_views.slt.part, diff: +4169, arrangement runtime: 8156, no shuffle runtime: 3987

key: e2e_test/streaming/tpch_snapshot.slt, diff: +45753, arrangement runtime: 77609, no shuffle runtime: 31856

key: e2e_test/streaming/tpch_upstream.slt, diff: +44402, arrangement runtime: 77006, no shuffle runtime: 32604

@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch from 22f3147 to 18d0f05 Compare January 30, 2024 09:42
@kwannoel
Copy link
Contributor Author

kwannoel commented Jan 30, 2024

Screenshot 2024-01-30 at 5 42 44 PM

Runtime still too long without the debug. 8 minutes vs 6 minutes for a normal PR https://buildkite.com/risingwavelabs/pull-request/builds/40961#018d590f-88cb-4868-9ad0-d57ad894bd8d

@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch from 18d0f05 to f7cf811 Compare January 30, 2024 09:46
@kwannoel
Copy link
Contributor Author

kwannoel commented Jan 30, 2024

At least main-cron is not taking a long time for e2e test streaming. Seems to be an issue specific to debug mode.
This PR 4m 34s:
Screenshot 2024-01-30 at 6 07 51 PM

Main cron without arrangement backfill default 5m:
Screenshot 2024-01-30 at 6 17 25 PM

@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch 2 times, most recently from a2038c2 to 654a721 Compare February 5, 2024 09:43
@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch from 654a721 to eea0bdb Compare February 7, 2024 05:41
@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch 2 times, most recently from 293bef2 to 7692c4d Compare February 27, 2024 04:21
@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch from 7692c4d to 1b0a4c1 Compare February 29, 2024 04:11
@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch 2 times, most recently from 3f624f0 to fc566e1 Compare March 13, 2024 06:50
@kwannoel
Copy link
Contributor Author

kwannoel commented Mar 14, 2024

@kwannoel
Copy link
Contributor Author

kwannoel commented Mar 18, 2024

arrangement no shuffle arrangement w tomb no shuffle w tomb
create_watermark_mv_latency(ms) 124186.916 235615.608 124532.255 213548.740
create_mv_latency(ms) 114223.373 241254.552 110062.274 230168.287
batch_query_latency(ms) 26236.617 24255.724 28296.241 15896.016
mv_query_latency(ms) 21391.008 22818.898 22080.314 16404.857

* tomb refers to tombstone, generated when there's deleted values. An old issue #12680 shows backfill had issues when there's a large number of tombstones.

@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch from fc566e1 to 1be3993 Compare March 19, 2024 06:04
@kwannoel kwannoel changed the title chore(config): make arrangement backfill default feat(config): make arrangement backfill default Mar 19, 2024
@kwannoel kwannoel force-pushed the kwannoel/arrangement-backfill-default branch from e309f80 to f504f88 Compare March 19, 2024 09:22
@kwannoel
Copy link
Contributor Author

Fix parallel in memory tests here. #15930

@kwannoel kwannoel enabled auto-merge March 26, 2024 18:44
@kwannoel kwannoel added this pull request to the merge queue Mar 27, 2024
Merged via the queue into main with commit b4e6ee3 Mar 27, 2024
32 of 34 checks passed
@kwannoel kwannoel deleted the kwannoel/arrangement-backfill-default branch March 27, 2024 09:11
@kwannoel kwannoel mentioned this pull request Mar 27, 2024
9 tasks
@kwannoel kwannoel added the user-facing-changes Contains changes that are visible to users label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants