-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] MOR table behavior for Spark Bulk insert to COW #12133
Comments
Currently, for this case Lines 65 to 68 in 5ccb19b
First insert uses SingleFileHandleCreateFactory , but the second insert will use AppendHandleFactory , and create log file.
I don't understand how Bulk insert to COW table with Simple bucket index should work by design. When we inserting data, that should update previous data, should we create new parquet file with new data, and call inline compaction (due to COW table type), or merge and write data to new parquet file, then it's not bulk insert? |
Bulk_insert should only be executed once IMO, for second update, you should use upsert operation instead. |
I've already created an issue HUDI-8394, but what to highlight and discuss this problem here.
I suppose, this is a critical issue with current master when:
hoodie.datasource.write.row.writer.enable = false
,Describe the problem you faced
When I try to bulk insert to COW table, I see in file system parquet and log files, which is MOR table behavior.
I've checked that table is COW type.
But files are not for COW table:
To Reproduce
To reproduce, existed test
Test Bulk Insert Into Bucket Index Table
could be modified and used:Expected behavior
For COW table, only parquet files should be created.
Environment Description
Hudi version : current master
Spark version : 3.5
The text was updated successfully, but these errors were encountered: