-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-6835 Fix first time compacted segment upload #22812
CORE-6835 Fix first time compacted segment upload #22812
Conversation
The test creates compacted topic without TS. It produces the data and waits until the data is compacted. After that it enables TS for the topic. The archiver has to be able to upload the data.
c8bb6ed
to
fe97fcf
Compare
if (it == set.end() || eligible_for_compacted_reupload(**it)) { | ||
// Skip forward if we hit a gap or compacted segment | ||
for (auto i = set.begin(); i != set.end(); i++) { | ||
const auto& sg = *i; | ||
if (start_offset < sg->offsets().get_base_offset()) { | ||
// Move last offset forward | ||
it = i; | ||
start_offset = sg->offsets().get_base_offset(); | ||
break; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clause used to take care of these conditions:
- if the
start_offset
lies in a gap, then makeit
point to the segment after the gap - if
it
is compacted, then point to the segment starting afterstart_offset
(which might also be compacted)
From the cover letter it is not possible for this code path to encounter a compacted segment because of the condition imposed on compaction (no compaction before upload). This path is also not used for compacted re-upload.
That still leaves the case where the start_offset
lies in a gap. Do we no longer expect this to happen? From what I can see now the progress will stop if the offset lies in a gap between two segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the start_offset
lies in the gap the lower_bound
will find the next segment in the list (which will have base-offset >= start_offset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looked to me (the code is a little convoluted in segment set) like it will return end
because from https:/redpanda-data/redpanda/blob/dev/src/v/storage/segment_set.cc#L94-L113 both needle_in_range
checks after the segment after the offset is selected, will fail because the offset is in neither the segment right after the gap nor in the segment before the gap.
But if it's known that the next segment will be returned and not end
then that sounds fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the alternative implementation for gap handling. But the code now doesn't care if the segment compacted or not.
fe97fcf
to
5b0ed9e
Compare
The archival_policy::find_segment method has a bug which causes it to skip compacted segments unconditionally. The code checks if the segment that it found is compacted and if this is the case it advances the iterator to the next segment. This operation causes ntp_archiver to skip one segment. The uploaded segment doesn't line up with the previous one so the validation error is triggered. The commit also implements handling of gaps in a safe way. We're not supposed to have gaps in the log (even in compacted logs) so the implementation has linear complexity. It's not supposed to be used and is good enough to hadnle unexpected unique glitch.
The test validates logic which was removed. Previousy, the uploader was skipping single compacted segment and now it will upload compacted segments. So the upload in the test should start from 0.
5b0ed9e
to
e2fd980
Compare
/ci-repeat 10 |
new failures in https://buildkite.com/redpanda/redpanda/builds/52758#01913cd4-cdef-4073-a78c-732caa5ca574:
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52758#01913cd4-cdec-4cb0-a1c1-22a7e3807c57 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52758#01913cd4-cdf4-4b2d-a4a1-a97a2009984f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52758#01913cd4-e56b-4f72-b436-9ec7bfd65364 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52758#01913cd4-e56e-4dba-afe7-f6807b273c88 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52758#01913cd4-e565-46ee-9dea-2213b47aaaf1 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52758#01913cd4-e56f-4232-9a47-d3078713a7de ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52758#01913cd4-e56f-401e-968f-3daa62e8bbbf |
/backport v24.2.x |
/backport v24.1.x |
/backport v23.3.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
This PR fixes the issue which manifests if the TS is enabled for pre-existing compacted topic. Normally, the TS prevents compaction from compacting any offset range which is not uploaded to the cloud storage yet. Our upload path always deals with non-compacted data without gaps. The only exception is when the compaction is enabled for existing compacted topic. In this case the data is already compacted but has to be uploaded to the cloud storage anyway. And the upload is performed by the normal upload path (not compacted reupload).
The problem is that there is a bug in the
archival_policy::find_segment
which makes it skip compacted segment. If the segment that the policy have found is compacted it just increments the iterator once and creates a gap. The segment can't be added to the archival STM.New segment does not line up with previous segment
error is added to the log and the operation is retried later. The uploads for the topic can't progress.This PR fixes the bug and adds a ducktape test that reproduces the problem.
Backports Required
Release Notes
Bug Fixes
New segment does not line up with previous segment
error for compacted topics.