Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(jdbc-sink): execute statements in batch and set isolation level to RC #12250

Merged
merged 10 commits into from
Sep 14, 2023

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented Sep 13, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Explicitly set transaction isolation level to READ COMMITTED (same as before).
Fix #12061, group SQL statements to a stagingStatements map, and execute them after processing a chunk. And we execute DELETE first to prevent accidentally deletions.

Performance numbers (sink a tpch orders table to postgres):
6k rows/s for each parallelism (4 parallelisms in total)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Comment on lines 237 to 252
// execute staging statement in batch after all rows are prepared,
// and we need to delete first to prevent accidentally deletion
var deleteStmt = stagingStatements.remove(OpType.DELETE);
if (deleteStmt != null) {
executeStatement(deleteStmt);
}

var upsertStmt = stagingStatements.remove(OpType.UPSERT);
if (upsertStmt != null) {
executeStatement(upsertStmt);
}

var insertStmt = stagingStatements.remove(OpType.INSERT);
if (insertStmt != null) {
executeStatement(insertStmt);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we have discussed executing deletes before inserts in sink executor before, but we should do a compaction in the upstream? cc @st1page

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure whether the execution order is right here, since our e2e test lacks of delete/update workload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenzl25 yes, The compaction logic has been implemented in #11070. And I will continue developing that in another PR later.

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Merging #12250 (e647d5b) into main (ae4b1f8) will decrease coverage by 0.01%.
Report is 4 commits behind head on main.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main   #12250      +/-   ##
==========================================
- Coverage   69.96%   69.95%   -0.01%     
==========================================
  Files        1411     1411              
  Lines      235305   235305              
==========================================
- Hits       164620   164600      -20     
- Misses      70685    70705      +20     
Flag Coverage Δ
rust 69.95% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 8 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@StrikeW
Copy link
Contributor Author

StrikeW commented Sep 13, 2023

Add the performance of main branch. Test setup: c5.4xlarge, release build, 4 parallelisms, S3 cc @hzxa21
image

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@StrikeW StrikeW changed the title refactor(jdbc-sink): execute statements in batch and modify isolation level refactor(jdbc-sink): execute statements in batch and set isolation level to RC Sep 14, 2023
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. So it is about 2x throughput improvement achieved by batching, right?

@StrikeW
Copy link
Contributor Author

StrikeW commented Sep 14, 2023

LGTM. So it is about 2x throughput improvement achieved by batching, right?

The main branch is about 4k rows/s, improvement is about 1.5x.

@StrikeW StrikeW added this pull request to the merge queue Sep 14, 2023
Merged via the queue into main with commit 0e72056 Sep 14, 2023
28 of 29 checks passed
@StrikeW StrikeW deleted the siyuan/v1.3-refactor-jdbc-sink branch September 14, 2023 11:04
Li0k pushed a commit that referenced this pull request Sep 15, 2023
Little-Wallace added a commit that referenced this pull request Sep 18, 2023
commit c82fc9c
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Sep 18 08:37:33 2023 +0000

    chore(deps): Bump chrono from 0.4.30 to 0.4.31 (#12359)

    Signed-off-by: dependabot[bot] <[email protected]>
    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: Runji Wang <[email protected]>
    Co-authored-by: TennyZhuang <[email protected]>

commit cbdc1ac
Author: Huangjw <[email protected]>
Date:   Mon Sep 18 16:22:35 2023 +0800

    chore(ci): move release jobs to main-cron pipeline (#12339)

commit b37a19c
Author: Yuhao Su <[email protected]>
Date:   Mon Sep 18 16:18:01 2023 +0800

    feat(dashboard): add memory profiling (#12052)

commit 71d8170
Author: TennyZhuang <[email protected]>
Date:   Mon Sep 18 15:58:26 2023 +0800

    refactor(expr): allow defining functions in frontend (#12287)

    Signed-off-by: TennyZhuang <[email protected]>
    Co-authored-by: zwang28 <[email protected]>
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

commit cedaec9
Author: Dylan <[email protected]>
Date:   Mon Sep 18 15:54:10 2023 +0800

    feat(optimizer): support agg group by simplify rule (#12349)

commit 71d9b0b
Author: Noel Kwan <[email protected]>
Date:   Mon Sep 18 15:32:00 2023 +0800

    feat(meta): update StreamJob status on finish (#12342)

commit 784fe56
Author: zwang28 <[email protected]>
Date:   Mon Sep 18 14:47:49 2023 +0800

    fix(backup): ensure correct delta log order (#12371)

commit 711ecd5
Author: congyi wang <[email protected]>
Date:   Mon Sep 18 14:11:24 2023 +0800

    feat(state_table): add iterator sub range under a certain pk prefix (#12251)

commit 1877aed
Author: xiangjinwu <[email protected]>
Date:   Mon Sep 18 13:49:15 2023 +0800

    refactor(sink): impl SinkFormatter for AppendOnly and Upsert (#12321)

commit f304ed2
Author: xxchan <[email protected]>
Date:   Sun Sep 17 20:20:17 2023 +0800

    revert: Revert "chore: add platforms to hakari (#12333)" (#12363)

commit a975d93
Author: Bohan Zhang <[email protected]>
Date:   Sun Sep 17 19:04:24 2023 +0800

    fix: handle kafka sink message timeout error (#12350)

commit 8ef74ad
Author: Runji Wang <[email protected]>
Date:   Sat Sep 16 12:16:02 2023 +0800

    fix(udf): handle visibility of input chunks in UDTF (#12357)

    Signed-off-by: Runji Wang <[email protected]>

commit 31fdc26
Author: Xu <[email protected]>
Date:   Fri Sep 15 21:01:14 2023 -0400

    feat(expr): switch to `fancy-regex` crate & update the original version (#12329)

    Co-authored-by: xzhseh <[email protected]>

commit 0032145
Author: Runji Wang <[email protected]>
Date:   Fri Sep 15 16:57:25 2023 +0800

    refactor(expr): support variadic function in `#[function]` macro (#12178)

    Signed-off-by: Runji Wang <[email protected]>

commit 467ba4b
Author: stonepage <[email protected]>
Date:   Fri Sep 15 16:28:13 2023 +0800

    fix: stream backfill executor use correct schema (#12314)

    Co-authored-by: Noel Kwan <[email protected]>

commit c443197
Author: Dylan <[email protected]>
Date:   Fri Sep 15 16:22:13 2023 +0800

    feat(optimizer): support correlated column in order by (#12341)

commit 8a36ca3
Author: Noel Kwan <[email protected]>
Date:   Fri Sep 15 16:11:03 2023 +0800

    feat(meta): Add `creating_status` field for stream jobs (#12330)

commit bf5b14e
Author: zwang28 <[email protected]>
Date:   Fri Sep 15 16:06:17 2023 +0800

    chore: lift decoding message size limit for ddl client (#12340)

commit c0060b2
Author: zwang28 <[email protected]>
Date:   Fri Sep 15 15:32:14 2023 +0800

    feat(meta): add hummock config relevant tables to rw_catalog (#12337)

commit 59bb645
Author: xxchan <[email protected]>
Date:   Fri Sep 15 14:54:54 2023 +0800

    chore: add platforms to hakari (#12333)

    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: Runji Wang <[email protected]>

commit 7baa27f
Author: Bugen Zhao <[email protected]>
Date:   Fri Sep 15 14:00:14 2023 +0800

    chore: split full debug info for release build (#12255)

    Signed-off-by: Bugen Zhao <[email protected]>

commit a99e6f3
Author: Richard Chien <[email protected]>
Date:   Fri Sep 15 13:58:19 2023 +0800

    fix(stream): fix pk indices of GroupTopN executors (#12304)

    Signed-off-by: Richard Chien <[email protected]>

commit 43c010e
Author: Croxx <[email protected]>
Date:   Fri Sep 15 11:59:41 2023 +0800

    chore: fix comment and metrics (#12331)

    Signed-off-by: MrCroxx <[email protected]>

commit 214118b
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Fri Sep 15 10:03:14 2023 +0800

    chore(deps): Bump serde_json from 1.0.106 to 1.0.107 (#12322)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 41ebb2a
Author: Xu <[email protected]>
Date:   Thu Sep 14 22:02:08 2023 -0400

    fix(regexp): substraction overflow when incorrectly speicifying `start` (#12325)

commit a566cfe
Author: Xu <[email protected]>
Date:   Thu Sep 14 12:58:35 2023 -0400

    feat(expr): add `array_sum` (#12162)

    Signed-off-by: Runji Wang <[email protected]>
    Co-authored-by: Runji Wang <[email protected]>

commit 28bbf10
Author: Croxx <[email protected]>
Date:   Fri Sep 15 00:40:27 2023 +0800

    fix(ci): exclude tikv-jemalloc-sys in hakari check (#12320)

    Signed-off-by: MrCroxx <[email protected]>

commit 5aa5a47
Author: zwang28 <[email protected]>
Date:   Thu Sep 14 21:02:01 2023 +0800

    feat(meta): add hummock version relevant tables to rw_catalog (#12309)

commit a740364
Author: Huangjw <[email protected]>
Date:   Thu Sep 14 19:11:04 2023 +0800

    chore(ci): install locales in prebuilt image (#12311)

    Signed-off-by: Bugen Zhao <[email protected]>
    Co-authored-by: Bugen Zhao <[email protected]>

commit 0e72056
Author: StrikeW <[email protected]>
Date:   Thu Sep 14 18:42:34 2023 +0800

    refactor(jdbc-sink): execute statements in batch and set isolation level to RC (#12250)

commit 827ed5e
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:31:41 2023 +0800

    refactor(connector): migrate cdc source metric from connector to compute (#12283)

commit a934185
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:31:04 2023 +0800

    fix(optimizer): relax scan predicate pull up mapping inverse restriction (#12308)

commit db0c099
Author: Dylan <[email protected]>
Date:   Thu Sep 14 17:30:28 2023 +0800

    feat(stream): handling watermark in temporal join (#12302)

commit 1ecea63
Author: Bugen Zhao <[email protected]>
Date:   Thu Sep 14 16:43:14 2023 +0800

    refactor(risedev): split the steps for building and running playground (#12279)

    Signed-off-by: Bugen Zhao <[email protected]>
    Co-authored-by: xxchan <[email protected]>

commit ae4b1f8
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Sep 14 08:41:29 2023 +0000

    chore(deps): Bump clap from 4.4.2 to 4.4.3 (#12245)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: Bugen Zhao <[email protected]>

commit 7ca370a
Author: Croxx <[email protected]>
Date:   Thu Sep 14 16:24:19 2023 +0800

    feat(refill): fetch whole sst file when refilling (#12265)

    Signed-off-by: MrCroxx <[email protected]>

commit ec129b6
Author: Yuhao Su <[email protected]>
Date:   Thu Sep 14 16:04:37 2023 +0800

    chore: use cfg! to instead of #cfg[] for jemalloc control policy (#12307)

commit 9814af8
Author: Runji Wang <[email protected]>
Date:   Thu Sep 14 14:45:14 2023 +0800

    feat(expr): add `pg_sleep` function (#12294)

    Signed-off-by: Runji Wang <[email protected]>

commit 4525e67
Author: Noel Kwan <[email protected]>
Date:   Thu Sep 14 14:38:03 2023 +0800

    feat(stream): support source throttling (#12295)

commit 5ffd58d
Author: Dylan <[email protected]>
Date:   Thu Sep 14 14:35:03 2023 +0800

    refactor(connector): replace validate source rpc with jni (#12270)

commit 888f2dd
Author: Eric Fu <[email protected]>
Date:   Thu Sep 14 14:32:59 2023 +0800

    fix: panic when dumping memory profile (#12276)

Signed-off-by: Little-Wallace <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: JDBC sink should updates rows in a batch way
5 participants