feat: also gracefully shutdown on SIGTERM #17802

BugenZhao · 2024-07-24T10:09:55Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

...because Kubernetes will send SIGTERM when killing a pod.

As described in #17662, this can empower more seamless scaling-in in Kubernetes deployments. (correct me if I'm wrong)

After this PR, I suppose we can get rid of the extra manual step of risingwave ctl meta unregister-worker when scaling-in, as described in the doc.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

The step 1 of manual unregistration of the compute node when scaling-in (doc) is not necessary any more. Users can directly apply the new yaml file to trigger a graceful scale-in.

BugenZhao · 2024-07-24T10:10:33Z

src/utils/runtime/src/lib.rs

@@ -96,24 +97,26 @@ where
 spawn_prof_thread(profile_path);
 }

+ let mut sigint = tokio::signal::unix::signal(SignalKind::interrupt()).unwrap();


We don't support Windows, so directly use unix-only features here.

BugenZhao · 2024-07-25T06:48:33Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @BugenZhao and the rest of your teammates on Graphite

Signed-off-by: Bugen Zhao <[email protected]>

stdrc · 2024-07-30T05:37:39Z

src/utils/runtime/src/lib.rs

+ // Watch SIGINT, typically originating from user pressing ctrl-c.
+ // Attempt to shutdown gracefully and force shutdown on the next signal.
+ _ = sigint.recv() => {
+ tracing::info!("received ctrl-c, shutting down... (press ctrl-c again to force shutdown)");


BugenZhao · 2024-07-30T09:17:52Z

Tested with risingwave-operator and it works like a charm. 🥰

compute:

2024-07-30T09:10:05.276715082Z  INFO risingwave_rt: received SIGTERM, shutting down...
2024-07-30T09:10:05.283413439Z  INFO risingwave_rpc_client::meta_client: successfully unregistered from meta service worker_id=2004
2024-07-30T09:10:05.283480937Z  WARN risingwave_stream::task::barrier_manager: shutdown with running actors, scaling or migration will be triggered
2024-07-30T09:10:05.283581726Z  INFO risingwave_stream::task::barrier_manager: waiting for meta service to close control stream...
2024-07-30T09:10:05.285297803Z  INFO risingwave_rpc_client::meta_client: Heartbeat loop is stopped
2024-07-30T09:10:05.285344052Z  INFO risingwave_common::telemetry::report: Telemetry exit

meta:

2024-07-30T09:10:05.283791678Z  WARN risingwave_meta::barrier::info: node with running actors is deleted node_id=2004 node=Some(WorkerNode { id: 2004, r#type: ComputeNode, host: Some(HostAddress { host: "risingwave-compute-2.risingwave-compute", port: 5688 }), state: Running, property: Some(Property { is_streaming: true, is_serving: true, is_unschedulable: false }), transactional_id: Some(4), resource: None, started_at: None, parallelism: 1 }) actors={2007, 2008}
2024-07-30T09:10:05.283967173Z  INFO failure_recovery{error=worker node 2004 is shutting down prev_epoch=6887532391759872}: risingwave_meta::barrier::recovery: recovery start!
2024-07-30T09:10:05.287488493Z  INFO risingwave_meta::hummock::manager::worker: Released hummock context 2004
2024-07-30T09:10:05.288955702Z  INFO failure_recovery{error=worker node 2004 is shutting down prev_epoch=6887532391759872}:recovery_attempt: risingwave_meta::manager::catalog::fragment: cleaning dirty downstream merge nodes for table sink
2024-07-30T09:10:05.293052214Z  INFO failure_recovery{error=worker node 2004 is shutting down prev_epoch=6887532391759872}:recovery_attempt: risingwave_meta::barrier::recovery: recovering mview progress
2024-07-30T09:10:05.293080297Z  INFO failure_recovery{error=worker node 2004 is shutting down prev_epoch=6887532391759872}:recovery_attempt: risingwave_meta::barrier::recovery: recovered mview progress
2024-07-30T09:10:05.304094784Z  INFO failure_recovery{error=worker node 2004 is shutting down prev_epoch=6887532391759872}:recovery_attempt: risingwave_meta::barrier::recovery: offline rescheduling for job 2001 in recovery is done

fuyufjh · 2024-07-31T02:41:43Z

cc. @arkbriar

BugenZhao requested review from shanicky and arkbriar July 24, 2024 10:09

github-actions bot added the type/feature label Jul 24, 2024

BugenZhao commented Jul 24, 2024

View reviewed changes

BugenZhao force-pushed the bz/shutdown-sigterm branch from 74a5dbe to fc9d5cd Compare July 25, 2024 06:48

graphite-app bot requested a review from a team July 25, 2024 06:48

BugenZhao changed the base branch from main to bz/more-graceful-shutdown July 25, 2024 06:48

BugenZhao mentioned this pull request Jul 25, 2024

fix(risedev): fixes and improvements for risedev-dev running risingwave commands #17586

Merged

4 tasks

github-actions bot added the ci/run-e2e-single-node-tests label Jul 25, 2024

This was referenced Jul 25, 2024

refactor: graceful shutdown on meta node & unify election path #17608

Merged

refactor: graceful shutdown in standalone mode #17633

Merged

BugenZhao mentioned this pull request Jul 25, 2024

feat: compute node unregisters from meta for graceful shutdown #17662

Merged

4 tasks

BugenZhao force-pushed the bz/shutdown-sigterm branch from fc9d5cd to c36de8a Compare July 25, 2024 06:49

github-actions bot added ci/run-e2e-single-node-tests and removed ci/run-e2e-single-node-tests labels Jul 25, 2024

BugenZhao force-pushed the bz/more-graceful-shutdown branch from 8c20bad to 4dcfb9d Compare July 26, 2024 08:26

BugenZhao force-pushed the bz/shutdown-sigterm branch from d0515b1 to 6188e69 Compare July 26, 2024 08:26

github-actions bot removed the ci/run-e2e-single-node-tests label Jul 26, 2024

Base automatically changed from bz/more-graceful-shutdown to main July 26, 2024 10:16

graphite-app bot requested a review from a team July 26, 2024 10:16

github-actions bot added the ci/run-e2e-single-node-tests label Jul 26, 2024

BugenZhao added 2 commits July 29, 2024 18:01

feat: also gracefully shutdown on SIGTERM

e408bcc

Signed-off-by: Bugen Zhao <[email protected]>

fix it

766b0b0

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao force-pushed the bz/shutdown-sigterm branch from 6188e69 to 766b0b0 Compare July 29, 2024 10:01

github-actions bot removed the ci/run-e2e-single-node-tests label Jul 29, 2024

stdrc approved these changes Jul 30, 2024

View reviewed changes

BugenZhao removed the request for review from a team July 30, 2024 09:17

BugenZhao added this pull request to the merge queue Jul 30, 2024

BugenZhao added the user-facing-changes Contains changes that are visible to users label Jul 30, 2024

Merged via the queue into main with commit b8e08c7 Jul 30, 2024
35 of 37 checks passed

BugenZhao deleted the bz/shutdown-sigterm branch July 30, 2024 09:44

neverchanje mentioned this pull request Jul 30, 2024

Document: feat: also gracefully shutdown on SIGTERM risingwavelabs/risingwave-docs#2418

Closed

WanYixian mentioned this pull request Sep 2, 2024

Remove manual unregistration when scale-in risingwavelabs/risingwave-docs#2561

Merged

2 tasks

BugenZhao mentioned this pull request Sep 18, 2024

risingwave 2.0.0 risingwavelabs/homebrew-risingwave#44

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: also gracefully shutdown on SIGTERM #17802

feat: also gracefully shutdown on SIGTERM #17802

BugenZhao commented Jul 24, 2024 •

edited

Loading

BugenZhao Jul 24, 2024

BugenZhao commented Jul 25, 2024 •

edited

Loading

stdrc Jul 30, 2024

BugenZhao commented Jul 30, 2024

fuyufjh commented Jul 31, 2024

feat: also gracefully shutdown on SIGTERM #17802

feat: also gracefully shutdown on SIGTERM #17802

Conversation

BugenZhao commented Jul 24, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

BugenZhao Jul 24, 2024

Choose a reason for hiding this comment

BugenZhao commented Jul 25, 2024 • edited Loading

stdrc Jul 30, 2024

Choose a reason for hiding this comment

BugenZhao commented Jul 30, 2024

fuyufjh commented Jul 31, 2024

BugenZhao commented Jul 24, 2024 •

edited

Loading

BugenZhao commented Jul 25, 2024 •

edited

Loading