Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8203] Make record merge mode the primary merging config #11943
base: master
Are you sure you want to change the base?
[HUDI-8203] Make record merge mode the primary merging config #11943
Changes from 91 commits
a010515
d3045da
371fd96
1a64f42
373f67f
8745ea0
f77cb87
60755d4
aac457a
63c9a00
c82c27f
5d466c3
416835f
9795fe8
ab6d567
ae29647
12e8de2
ce04bd5
01f2024
ffc94b8
c789678
88799a2
638d350
3e486d2
1db1fec
8d0d358
4d04980
3ee84d8
da849df
3480af2
1fce20c
2371e19
8af0e40
8dfedc0
5ace882
080f024
f10ab50
5c9783b
5f59bd2
f9aee6e
aeea89b
8a737c0
6e08ef5
6e43f93
2ab97ce
eaaf998
e6ee76a
100e686
3b89dcb
22ebef6
6c2abc9
f6a0da2
80a2e16
f61553c
b07d02b
597677f
7aac16b
080de60
6fc3da5
1751520
da99393
232d0d0
9630cbe
a345c4c
4af7de5
36bd377
0e0775c
6d581c5
7723029
90db422
3ff9e74
699d455
70852c5
c4dc95f
3726f54
8306c68
95f8969
6e96999
b1bd0ca
ab40874
ac45359
cef2687
9f3adb4
709a37a
143dfa7
bdab7de
b0506a2
8fab0f0
57ad75e
25897cb
9119372
addd352
4cdbd16
f619f0b
f55704f
ad0682a
4d11955
b851357
4303223
9a56f78
08b73c0
1bb38d2
413ebc4
75beb3d
3c8df12
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep only one config here instead of alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make all record merge configs to start with
--record-merge-
,hoodie.write.record.merge.
, ``hoodie.table.record.merge., e.g.,
--record-merge-strategy`, `hoodie.write.record.merge.strategy`, to be consistent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this variable for backwards compatible read (https:/apache/hudi/pull/11943/files#r1805475755), could we hard code it to be
DefaultHoodieRecordPayload.class.getName()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may still need the default value for backwards compatible write on 0.x tables cc @vinothchandar @bvaradar @balaji-varadarajan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we do. But we might need to add something so that it doesn't remove the payload class from hoodie.properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that, for the backwards compatible writer writing table version 6 (0.14 release and above) in Hudi 1.0 release, it should not use the merge mode and only leverage payload class to create the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point that payload class is no longer required in Hudi 1.0 tables so that no default should be set. For Hudi 1.0 release reading 0.x tables, should we still maintain a variable else where to track the default payload class to be used? The backwards compatibility read logic can be handled in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that this can be switched back to
OVERWRITE_WITH_LATEST
. Is there still an issue? I'm OK withEVENT_TIME_ORDERING
; trying to see if it still strictly requires precombine/ordering field to be set.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can hardcode the legacy default value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we handle the handling of case sensitivity inside
HoodieFileFormat
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this API used by Avro merger or custom merge mode only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used to get the avro merger as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked if all existing logic of getting the payload class go through this method? Since this logic now returns the default payload class from the method instead of the config default value, we need to make sure any logic using
getString(HoodiePayloadConfig.PAYLOAD_CLASS_NAME)
should migrate to this API too, to avoid possible null or empty payload class name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the payload class not set before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because the payload type is set before (
HOODIE_METADATA
), which is removed now?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to keep the logic of adding the required table configs in table version 8, correct? We should avoid the Hudi 1.0 writer to change table configs as much as possible, i.e., invariant table configs should either be created for a new table or changed during upgrade/downgrades, except for MDT related table configs, to avoid possible corruption.