Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(connector): introduce parquet file source #17201

Merged
merged 40 commits into from
Jul 12, 2024
Merged

Conversation

wcy-fdu
Copy link
Contributor

@wcy-fdu wcy-fdu commented Jun 11, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This pr introduce a new encode parquet for file source, user can read .parquet files via

CREATE TABLE/SOURCE x(
)
WITH (
    connector = 's3_v2'/'gcs',
    match_pattern = '*.parquet',
) FORMAT PLAIN ENCODE PARQUET;

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

This pr introduce parquet file source, and user can use file source to read parquet file. The syntax is:

CREATE TABLE x(
)
WITH (
    connector = 's3_v2'/'gcs',
    match_pattern = '*.parquet',
) FORMAT PLAIN ENCODE PARQUET;

@wcy-fdu wcy-fdu requested a review from a team as a code owner June 11, 2024 09:12
@wcy-fdu wcy-fdu marked this pull request as draft June 11, 2024 09:12
@wcy-fdu wcy-fdu marked this pull request as ready for review June 14, 2024 07:23
@graphite-app graphite-app bot requested a review from a team June 14, 2024 09:27
@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented Jun 17, 2024

The e2e test passed on main cron.
image

@graphite-app graphite-app bot requested a review from a team July 10, 2024 09:35
src/connector/Cargo.toml Outdated Show resolved Hide resolved
src/common/src/array/arrow/arrow_impl.rs Show resolved Hide resolved
src/connector/src/parser/parquet_parser.rs Outdated Show resolved Hide resolved
src/connector/src/error.rs Outdated Show resolved Hide resolved
@wcy-fdu wcy-fdu requested a review from hzxa21 July 11, 2024 10:51
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/connector/src/parser/parquet_parser.rs Outdated Show resolved Hide resolved
@wcy-fdu wcy-fdu enabled auto-merge July 12, 2024 09:59
@wcy-fdu wcy-fdu added this pull request to the merge queue Jul 12, 2024
Merged via the queue into main with commit 102a60d Jul 12, 2024
31 of 32 checks passed
@wcy-fdu wcy-fdu deleted the wcy/parquet_source branch July 12, 2024 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants