Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColIndexMapping unable to handle duplicate output column ids #15679

Closed
kwannoel opened this issue Mar 14, 2024 · 3 comments
Closed

ColIndexMapping unable to handle duplicate output column ids #15679

kwannoel opened this issue Mar 14, 2024 · 3 comments
Labels
type/bug Something isn't working type/feature
Milestone

Comments

@kwannoel
Copy link
Contributor

kwannoel commented Mar 14, 2024

Similar to #15075, we could have the same input column mapped to 2 different output indices, if optimizer can't remove it.

In that case, our i2o mapping unable to handle it, because it can only map to 1 or no corresponding element.


/// `ColIndexMapping` is a partial mapping from usize to usize.
///
/// It is used in optimizer for transformation of column index.
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct ColIndexMapping {
    /// The size of the target space, i.e. target index is in the range `(0..target_size)`.
    target_size: usize,
    /// Each subscript is mapped to the corresponding element.
    map: Vec<Option<usize>>,
}

The result is that this PR's e2e test fails: #14846

https://buildkite.com/risingwavelabs/pull-request/builds/44337

thread 'rw-streaming' panicked at src/common/src/util/column_index_mapping.rs:39:13:
assertion failed: target_max < target_size
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/e4c626dd9a17a23270bf8e7158e59cf2b9c04840/library/core/src/panicking.rs:144:5
   3: risingwave_common::util::column_index_mapping::ColIndexMapping::new
             at ./src/common/src/util/column_index_mapping.rs:39:13
   4: {async_fn#0}<risingwave_storage::monitor::monitored_store::MonitoredStateStore<alloc::boxed::Box<dyn risingwave_storage::store_impl::boxed_state_store::DynamicDispatchedStateStore, alloc::alloc::Global>>, risingwave_common::util::value_encoding::column_aware_row_encoding::ColumnAwareSerde, true, risingwave_stream::common::table::watermark::WatermarkBufferByEpoch<300>, false>
             at ./src/stream/src/common/table/state_table.rs:417:27
   5: {async_fn#0}<risingwave_storage::monitor::monitored_store::MonitoredStateStore<alloc::boxed::Box<dyn risingwave_storage::store_impl::boxed_state_store::DynamicDispatchedStateStore, alloc::alloc::Global>>, risingwave_common::util::value_encoding::column_aware_row_encoding::ColumnAwareSerde, risingwave_stream::common::table::watermark::WatermarkBufferByEpoch<300>, false>
             at ./src/stream/src/common/table/state_table.rs:700:96
   6: {async_fn#0}<risingwave_storage::monitor::monitored_store::MonitoredStateStore<alloc::boxed::Box<dyn risingwave_storage::store_impl::boxed_state_store::DynamicDispatchedStateStore, alloc::alloc::Global>>>
             at ./src/stream/src/from_proto/stream_scan.rs:141:21
   7: {async_fn#0}<risingwave_storage::monitor::monitored_store::MonitoredStateStore<alloc::boxed::Box<dyn risingwave_storage::store_impl::boxed_state_store::DynamicDispatchedStateStore, alloc::alloc::Global>>>
             at ./src/stream/src/from_proto/mod.rs:133:5
   8: {async_block#0}<risingwave_storage::monitor::monitored_store::MonitoredStateStore<alloc::boxed::Box<dyn risingwave_storage::store_impl::boxed_state_store::DynamicDispatchedStateStore, alloc::alloc::Global>>>
@kwannoel
Copy link
Contributor Author

Bug found in #14846

@chenzl25
Copy link
Contributor

chenzl25 commented Mar 14, 2024

Check this method usage for more details.

/// Inverse the mapping. If a target corresponds to more than one source, return `None`.
#[must_use]
pub fn inverse(&self) -> Option<Self> {

@stdrc
Copy link
Contributor

stdrc commented Mar 14, 2024

In my understanding, such column index mapping should contains source indices (or None) from an output/target perspective, so that map[0] == 1 means the output/target column 0 is from the input column 1, map[7] == None means the out column 7 is somehow generated instead of directly duplicate some input column. However, the ColIndexMapping used in optimizer seems act reversely, which I think is not quite intuitive.

I'm not sure😇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working type/feature
Projects
None yet
Development

No branches or pull requests

3 participants