-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-32276: [C++][FlightRPC] Align RecordBatch buffers given to IPC #44279
base: main
Are you sure you want to change the base?
Conversation
@pitrou do you think this fix is viable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up. This seems reasonable to me at a quick glance.
cpp/src/arrow/array/data.h
Outdated
@@ -23,6 +23,7 @@ | |||
#include <memory> | |||
#include <utility> | |||
#include <vector> | |||
#include <arrow/util/range.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put this include with the rest of the Arrow includes (and use quotes to be consistent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cpp/src/arrow/array/data.h
Outdated
} | ||
} | ||
// align children data recursively | ||
for (unsigned int i=0; i<child_data.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could iterate with for (auto& child : child_data)
and avoid the explicit index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better!
python/pyarrow/tests/test_ipc.py
Outdated
@@ -548,11 +548,16 @@ def test_read_options(): | |||
options = pa.ipc.IpcReadOptions() | |||
assert options.use_threads is True | |||
assert options.ensure_native_endian is True | |||
assert options.ensure_memory_alignment is True | |||
assert options.ens is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
77cc70a
to
a5d9e2d
Compare
While attempting to write some unit tests I found there is arrow/cpp/src/arrow/util/align_util.cc Lines 169 to 205 in e62fbaa
I will try to reuse that method rather than re-implementing it. There is also test infrastructure for misaligned array data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase? Tests appear to be failing
cpp/src/arrow/ipc/reader.cc
Outdated
auto batch = RecordBatch::Make(std::move(filtered_schema), metadata->length(), | ||
std::move(filtered_columns)); | ||
if (context.options.ensure_memory_alignment) { | ||
return util::EnsureAlignment(batch, arrow::util::kValueAlignment, default_memory_pool()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the memory pool in context.options.memory_pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Ideally we should use the buffer's memory manager rather than the default CPU manager:
arrow/cpp/src/arrow/memory_pool.cc
Lines 907 to 916 in 5ad0b3e
static std::unique_ptr<PoolBuffer> MakeUnique(MemoryPool* pool, int64_t alignment) { | |
std::shared_ptr<MemoryManager> mm; | |
if (pool == nullptr) { | |
pool = default_memory_pool(); | |
mm = default_cpu_memory_manager(); | |
} else { | |
mm = CPUDevice::memory_manager(pool); | |
} | |
return std::make_unique<PoolBuffer>(std::move(mm), pool, alignment); | |
} |
459ad15
to
960cb21
Compare
960cb21
to
9909f13
Compare
Test arrow/cpp/src/arrow/util/align_util.cc Lines 44 to 52 in bcb4653
https:/apache/arrow/actions/runs/11462607112/job/31894398411?pr=44279#step:13:1548 That test complains a lot about arrow/cpp/src/arrow/util/align_util.cc Lines 56 to 76 in bcb4653
Looks like |
f2dae5b
to
d1219d2
Compare
Rationale for this change
Data retrieved via IPC is expected to provide memory-aligned arrays, but data retrieved via C++ Flight client is mis-aligned. Datafusion (Rust), which requires proper alignment, cannot handle such data: #43552.
What changes are included in this PR?
This aligns RecordBatch array buffers decoded by IPC if mis-aligned according to the data type byte width.
Implementation mirrors that of
align_buffers
in arrow-rs (apache/arrow-rs#4681).Are these changes tested?
Configuration flag tested in unit test.
Manually end-to-end tested that memory alignment fixes issue with reproduction code provided in #43552.
Are there any user-facing changes?
Memory alignment is checked and fixed by default. This is configurable via
IpcReadOptions.ensure_memory_alignment
.