Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Sink): support redis sink #11999

Merged
merged 11 commits into from
Oct 23, 2023
Merged

feat(Sink): support redis sink #11999

merged 11 commits into from
Oct 23, 2023

Conversation

xxhZs
Copy link
Contributor

@xxhZs xxhZs commented Aug 31, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Support redis sink. Data in RW will be saved in Redis in the form of string key-value pairs. By default, our key is the primary key, and the value includes all other values. Users can also input their desired format in string format. Therefore, even in the case of 'append only', we still need to provide the primary key.

Please refer to the README for specific instructions.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@xxhZs xxhZs marked this pull request as ready for review September 4, 2023 03:56
@xxhZs xxhZs requested a review from a team as a code owner September 4, 2023 03:56
@codecov
Copy link

codecov bot commented Sep 4, 2023

Codecov Report

Merging #11999 (3fbcaa9) into main (156c52e) will increase coverage by 0.09%.
Report is 1 commits behind head on main.
The diff coverage is 56.42%.

@@            Coverage Diff             @@
##             main   #11999      +/-   ##
==========================================
+ Coverage   68.75%   68.84%   +0.09%     
==========================================
  Files        1495     1496       +1     
  Lines      250189   250508     +319     
==========================================
+ Hits       172015   172465     +450     
+ Misses      78174    78043     -131     
Flag Coverage Δ
rust 68.84% <56.42%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/connector/src/common.rs 1.89% <ø> (+0.16%) ⬆️
src/connector/src/sink/encoder/mod.rs 89.74% <ø> (+17.94%) ⬆️
src/connector/src/sink/encoder/template.rs 100.00% <100.00%> (ø)
src/connector/src/sink/formatter/mod.rs 23.30% <88.88%> (+23.30%) ⬆️
src/connector/src/sink/mod.rs 60.71% <0.00%> (-1.33%) ⬇️
src/connector/src/sink/doris.rs 0.00% <0.00%> (ø)
src/connector/src/sink/clickhouse.rs 0.00% <0.00%> (ø)
src/connector/src/sink/redis.rs 53.23% <54.24%> (+53.23%) ⬆️

... and 23 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@st1page
Copy link
Contributor

st1page commented Sep 4, 2023

related to #11995 ? Should it be FORMAT UPSERT ENCODE PLAIN?

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Sep 4, 2023

related to #11995 ? Should it be FORMAT UPSERT ENCODE PLAIN?

It is indeed a new encoding but we may need a better name than plain:

  • It may be configured by a template string, as the integration test shows username:{username},event_timestamp{event_timestamp} (btw missing : for event_timestamp, or intentional to mean user can do anything incl typos?)
    • candidate name: text_tmpl
    • related: feat(expr): support Format expression #11370
    • problems: if the user forget to use a column in keyformat, different rw keys would map to same redis key; if the user does not use a column in valueformat, the column is better pruned during planning phase.
  • The default key template is like 213:alice:2023-01-12
    • candidate name: colon_seperated_value
  • The default value template is like [t,laptop,123.73]
    • candidate name: csv_inside_bracket

Is the default really useful? Maybe we shall require the user to provide templates when using this new encoding, or design the default carefully - for example, the value itself may contain :, , or ].

Talking about implementation, to_text is supposed to stay in sync with PostgreSQL's typoutput of each type, which is unfortunately impure and depends on a lot of implicit session configs (TimeZone, DateStyle, IntervalStyle, extra_float_digits, bytea_output, etc). Currently to_text does not take any extra argument because we do not support any of the configs above except TimeZone, which is currently handled by the caller before calling Timestamptz::to_text. So the question is: are the strings in sink expected to match the PostgreSQL typoutput?

As for the issue #11995, I am still investigating a good way to refactor them. Right now we can focus on certain use cases first, for example kafka/kinesis coupled to json and redis coupled to this new encoding.

src/connector/src/sink/redis.rs Outdated Show resolved Hide resolved
src/connector/src/sink/redis.rs Outdated Show resolved Hide resolved
@xiangjinwu
Copy link
Contributor

The backend part of #11995 has been merged. Check this link as an example of how it could be applied to redis and the new encoding here.

@xxhZs xxhZs requested a review from hzxa21 September 18, 2023 10:55
src/connector/src/sink/redis.rs Outdated Show resolved Hide resolved
Comment on lines +196 to +197
type K = String;
type V = Vec<u8>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, ToRedisArg accepts bytes for both key and value. any reason for limiting key to string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting k v into redis in string format is only supported here.

}

fn check_string_format(format: &str, set: &HashSet<String>) -> Result<()> {
let re = Regex::new(r"\{([^}]*)\}").unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it handle format containing escaped string? such as '{\"a\": \{{col_a}\}}'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, escaped characters are treated as strings, and {} can only have column names in it, not other characters.

src/connector/src/common.rs Outdated Show resolved Hide resolved
src/connector/src/sink/encoder/template.rs Outdated Show resolved Hide resolved
src/connector/src/sink/encoder/template.rs Show resolved Hide resolved
src/connector/src/sink/formatter/mod.rs Outdated Show resolved Hide resolved
src/connector/src/sink/redis.rs Show resolved Hide resolved
@@ -107,6 +109,70 @@ impl SinkFormatterImpl {
)))
}
}

pub fn new_with_redis(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax for create redis sink with key value template format may follow the new create sink ... format XXX encode XXX syntax.

If this PR is urgent, we can move the template row encoder to a separate PR, and in this PR we only use the json encoder for both key and value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will create a new pull request to implement the new syntax. I think we can initially implement it in the current way, and then create a new pull request for modifications.

src/connector/src/sink/redis.rs Outdated Show resolved Hide resolved
@xxhZs xxhZs requested a review from wenym1 October 20, 2023 07:04
@xxhZs xxhZs enabled auto-merge October 23, 2023 03:45
@xxhZs xxhZs added this pull request to the merge queue Oct 23, 2023
Merged via the queue into main with commit 848bdda Oct 23, 2023
28 checks passed
@xxhZs xxhZs deleted the xxh/redis-sink branch October 23, 2023 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants