Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Unable to filter DECIMAL column from ORC file #1022

Open
karan-k-deepr opened this issue Jan 20, 2022 · 4 comments
Open

[C++] Unable to filter DECIMAL column from ORC file #1022

karan-k-deepr opened this issue Jan 20, 2022 · 4 comments

Comments

@karan-k-deepr
Copy link

karan-k-deepr commented Jan 20, 2022

This question is similar to THIS one I asked before on StackOverflow, which after some more trials it works.

Previously there was some issue with the column Id but now I am trying to filter a column of DECIMAL data type but always results give me all the data instead of the filtered one.



Data which ORC file has in the required columns:

enter image description here

And this is how I am trying to filter out the DECIMAL column using orc::SearchArgument:

orc::RowReaderOptions m_RowReaderOpts;
orc::ReaderOptions m_ReaderOpts;

std::unique_ptr<orc::Reader> m_Reader;
std::unique_ptr<orc::RowReader> m_RowReader;

auto builder = orc::SearchArgumentFactory::newBuilder();
const int snapshot_time_col_id = 22;

orc::Literal ss_begin_time{34080000000000, 14, 9};
orc::Literal ss_end_time{34380000000000, 14, 9};

// I HAVE ALSO TRIED, but didn't work.
// orc::Literal ss_begin_time{34080, 5, 0};
// orc::Literal ss_end_time{34380, 5, 0};

builder->between(snapshot_time_col_id, orc::PredicateDataType::DECIMAL, ss_begin_time, ss_end_time);

m_RowReaderOpts.searchArgument(builder->build());
reader = orc::createReader(orc::readFile(a_FilePath.c_str()), m_ReaderOpts);
row_reader = reader->createRowReader(m_RowReaderOpts);

Please give some suggestions on how to filter data of type DECIMAL?
@dongjoon-hyun
Copy link
Member

cc @wgtmac and @stiga-huang

@karan-k-deepr
Copy link
Author

Any update on this bug?

@stiga-huang
Copy link
Contributor

Could you verify if the whole batch returned by row_reader->next() violates the SearchArgument? If so, there are bugs. Otherwise, it's by design.

orc::SearchArgument is used as an indicator for the reader to skip unrelated RowGroups, i.e. it's only evaluated on RowGroup level (not row-level). If the reader can't filter out a RowGroup based on the SearchArgument, it will return all rows of that RowGroup. The caller is expected to filter out rows by itself.

@karan-k-deepr
Copy link
Author

@stiga-huang I tried checking the min and max value of the batch received by the row_reader->next() command. And the batch it's returned doesn't filter anything for decimal values.

@dongjoon-hyun dongjoon-hyun changed the title Unable to filter DECIMAL column from ORC file in c++ [C++] Unable to filter DECIMAL column from ORC file Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants