Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sink): support encode jsonb data as dynamic json type in sink #17693

Merged
merged 9 commits into from
Jul 17, 2024

Conversation

yuhao-su
Copy link
Contributor

@yuhao-su yuhao-su commented Jul 16, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

close #11699

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Add a option for ENCODE JSON in WITH clause .

jsonb.handling.mode can be set to either string or dynamic

  • string: encode the jsonb type as a string. example: {"k": 2} -> "{\"k\": 2}". Here a json object jsonb was encoded to a string.
  • dynamic: dynamically encode the jsonb type as json type. example: {"k": 2} -> {"k": 2}. Here a json object jsonb was encoded to a json object.

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@yuhao-su yuhao-su added the user-facing-changes Contains changes that are visible to users label Jul 16, 2024
Comment on lines 307 to 310
CustomJsonType::None => match config.encode_jsonb_mode {
EncodeJsonbMode::String => {
json!(jsonb_ref.to_string())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core logic

Comment on lines 706 to 719
let encode_jsonb_obj_config = JsonEncoderConfig {
time_handling_mode: TimeHandlingMode::Milli,
date_handling_mode: DateHandlingMode::String,
timestamp_handling_mode: TimestampHandlingMode::String,
timestamptz_handling_mode: TimestamptzHandlingMode::UtcString,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this refactor to its own PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stacked on #17706

/// How the jsonb type is encoded.
///
/// - `string`: encode jsonb as string. `[1, true, "foo"] -> "[1, true, \"foo\"]"`
/// - `object`: encode jsonb as json object. `[1, true, "foo"] -> [1, true, "foo"]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate. [1, true, "foo"] is a json array rather than a json object. This is why the original issue calls it dynamic type: it may be one of json null / boolean / number / string / array / object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So rename it to dynamic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I use object bc the function name is datum_to_json_object()
`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the top level json returned by this function is always an object containing multiple fields, corresponding to a table row containing multiple columns.

@yuhao-su
Copy link
Contributor Author

We didn't have timestamptz.handling.mode or encode.jsonb.mode for NATS or snowflake. Should we add those? cc @xiangjinwu

@xiangjinwu
Copy link
Contributor

We didn't have timestamptz.handling.mode or encode.jsonb.mode for NATS or snowflake. Should we add those? cc @xiangjinwu

  • NATS: Yes. But it has not been migrated to the format ... encode ... (...) syntax yet.
  • Snowflake: I am not aware of this sink. What is the current data type mapping?

@yuhao-su yuhao-su changed the base branch from main to yuhao/json-encoder-config July 16, 2024 19:32
@yuhao-su yuhao-su requested a review from xiangjinwu July 16, 2024 19:33
@yuhao-su
Copy link
Contributor Author

yuhao-su commented Jul 16, 2024

Snowflake: I am not aware of this sink. What is the current data type mapping?

Currently it uses default behavior that is encode jsonb as string.

It's my understanding users usually create a table with 1 column to store raw data from json file. And create more tables from the source table. It makes more sense to encode RW as dynamic json type. But snowflake user can handle string by PARSE_JSON https://docs.snowflake.com/en/user-guide/tutorials/json-basics-tutorial#prerequisites
cc @chenzl25 @xzhseh

But it has not been migrated to the format ... encode ... (...) syntax yet.

We can do this later.

@yuhao-su yuhao-su changed the title feat(sink): support encode jsonb data as json object in sink feat(sink): support encode jsonb data as dynamic json type in sink Jul 16, 2024
@xzhseh
Copy link
Contributor

xzhseh commented Jul 17, 2024

Snowflake: I am not aware of this sink. What is the current data type mapping?

Currently it uses default behavior that is encode jsonb as string.

It's my understanding users usually create a table with 1 column to store raw data from json file. And create more tables from the source table. It makes more sense to encode RW as dynamic json type. But snowflake user can handle string by PARSE_JSON https://docs.snowflake.com/en/user-guide/tutorials/json-basics-tutorial#prerequisites cc @chenzl25 @xzhseh

But it has not been migrated to the format ... encode ... (...) syntax yet.

We can do this later.

I don't think there is any type mapping setup for Snowflake sink as for now..

every column is by default handled/interpreted at Snowflake side, and users need to make sure the data types/column names are matched between the two, though for some types there are no precise one-to-one mapping.

Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Thank you!

src/connector/src/sink/encoder/json.rs Outdated Show resolved Hide resolved
src/connector/src/sink/encoder/mod.rs Outdated Show resolved Hide resolved
Base automatically changed from yuhao/json-encoder-config to main July 17, 2024 05:15
@yuhao-su yuhao-su enabled auto-merge July 17, 2024 06:23
@@ -628,11 +639,12 @@ mod tests {
assert_eq!(date_value, json!(719163));

let from_epoch_config = JsonEncoderConfig {
time_handling_mode: TimeHandlingMode::Milli,
time_handling_mode: TimeHandlingMode::String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrelevant change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah some merge issue. But this test is for date_handling_mode sotime_handling_mode won't matter.

@yuhao-su yuhao-su added this pull request to the merge queue Jul 17, 2024
Merged via the queue into main with commit 30fd4d8 Jul 17, 2024
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

connector: support sinking jsonb as dynamic type
3 participants