Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gandiva C++ Merge. #7

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
bae1c18
Bootstrap evaluation using LLVM code generation (#1)
pravindra May 28, 2018
aab0d53
GDV-45: Make use of the modular features of cmake
pravindra May 31, 2018
b11938a
GDV-43: [C++] Introduce error codes as error handling strategy. (#8)
praveenbingo May 31, 2018
4c599eb
GDV-10 : Support functions of type NULL_INTERNAL
pravindra Jun 2, 2018
2b94249
GDV-59: [C++] Simplify the api to make function nodes
pravindra Jun 3, 2018
9dae5f8
GDV-23: [C++] Support if-else expression
pravindra Jun 4, 2018
a9221f3
GDV-60: [C++] expr decomposition moved to visitor
pravindra Jun 4, 2018
98b7cc6
GDV-24: [C++] Support literal expressions
pravindra Jun 5, 2018
24cd01f
GDV-61: [C++] Reduce bitmap updates for if-else
pravindra Jun 6, 2018
2b3a4c9
GDV-65: [C++] Add CMake support for proto files
pravindra Jun 6, 2018
68d5939
GDV-66: [C++] Add a zero-copy variant to Evaluate
pravindra Jun 8, 2018
094d238
GDV-49: [C++] switch to /// or // style comments
pravindra Jun 8, 2018
dac832a
GDV-46: [C++] Add unit tests for bitmap/time fns
pravindra Jun 11, 2018
19ec133
GDV-58: [CPP] Fix order of includes. (#25)
praveenbingo Jun 12, 2018
a29e60a
GDV-7: Add Java APIs (#23)
vvellanki Jun 12, 2018
3bdaa66
GDV-52: update benchmark results
pravindra Jun 14, 2018
f0ef530
GDV-55: [C++] Added validation to projector build. (#33)
praveenbingo Jun 16, 2018
6f093ad
GDV-26: [C++] Support boolean and/or
pravindra Jun 22, 2018
affeed7
Gdv 72: [C++] Support null literals
pravindra Jun 22, 2018
8c0da27
GDV-26: [Java] Support AND/OR control expressions
pravindra Jun 27, 2018
ddfd6f8
GDV-72: [Java] Support null literals
pravindra Jun 27, 2018
49a835b
GDV-21: Support date/time functions and datatypes (#45)
vvellanki Jun 29, 2018
b0c089b
GDV-68:[Java][C++]Dynamically load dependencies. (#49)
praveenbingo Jul 2, 2018
8ce06fc
GDV-20: [C++] Support variable len arrow vectors
pravindra Jul 3, 2018
d3cc004
GDV-71:[C++]Made Gandiva JNI a packagable library. (#42)
praveenbingo Jul 4, 2018
011cd10
GDV-41: [CPP] clang-format to validate/fix style (#52)
pravindra Jul 4, 2018
0ceefd4
GDV-20: [Java] support varlen types in gandiva (#61)
pravindra Jul 6, 2018
9ed36a6
GDV-25: Add cpp/Java microbenchmarks
vvellanki Jul 7, 2018
080a3cd
Gdv 21: Added support for time32 and timestampdiff functions
vvellanki Jul 8, 2018
4833544
GDV-87: Add support to print expressions (#63)
vvellanki Jul 11, 2018
cd84532
Gdv 88: Support more date/time functions (#67)
vvellanki Jul 14, 2018
cdc5c34
GDV-83: [CPP] link libstdc++ statically
pravindra Jul 16, 2018
0e02214
GDV-82:[Java][CPP]Export supported types from Gandiva. (#66)
praveenbingo Jul 17, 2018
fe6a5cf
Fix missing include directory of gtest in CMakeLists.txt (#68)
masayuki038 Jul 18, 2018
770d2bb
GDV-90:[C++]Fixed extract second from time. (#70)
praveenbingo Jul 19, 2018
01f46fe
GDV-28: [C++] Add hash functions on all data types (#69)
pravindra Jul 20, 2018
1e23542
GDV-94:[C++][Java]Fixed literals and nulls for time types. (#72)
praveenbingo Jul 23, 2018
9da5980
GDV-93:[Java][C++]Fixed reference initializations. (#73)
praveenbingo Jul 25, 2018
b209ab5
GDV-92: Add support for more date/time functions
vvellanki Jul 25, 2018
a4b2422
GDV-95:[C++]Match gandiva mod operator to dremio for mod zero. (#74)
praveenbingo Jul 30, 2018
00c1cba
GDV-13: [C++] Add support for filters (#75)
pravindra Jul 31, 2018
4840f58
GDV-13: [Java] Add java bindings for filter expr (#77)
pravindra Aug 3, 2018
578615b
GDV-13:[Java]Fixed filter bugs. (#78)
praveenbingo Aug 7, 2018
0c8fab4
GDV-13:[C++]Fixed selection vector array type (#79)
praveenbingo Aug 7, 2018
0d670f9
DX-12080:[Java][C++]Executing TPCH queries. (#81)
praveenbingo Aug 16, 2018
49d9a24
GDV-31:[C++][Travis]Perf Improvments (#82)
praveenbingo Aug 17, 2018
fd55d9f
GDV-31:[C++]Caching projectors and filters for re-use. (#83)
praveenbingo Aug 23, 2018
cb74b04
GDV-31:[Java][C++]Fixed concurrency issue in cache. (#84)
praveenbingo Aug 23, 2018
3c1156e
GDV-31:[C++]Fixed Literal ToString. (#85)
praveenbingo Aug 24, 2018
c675302
GDV-56: [C++] Add support for sql regex functions (#86)
pravindra Aug 29, 2018
2f6af26
GDV-56: [C++] Add a helper library containing cpp stubs (#88)
pravindra Sep 3, 2018
0a4e15e
Merge Gandiva CPP into Arrow CPP.
praveenbingo Sep 4, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions cpp/src/gandiva/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright (C) 2017-2018 Dremio Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

cmake_minimum_required(VERSION 3.10)

project(gandiva)

# LLVM/Clang is required by multiple subdirs.
find_package(LLVM)

# Set the path where the byte-code files will be installed.
set(GANDIVA_BC_INSTALL_DIR
${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_INCLUDEDIR}/gandiva)

set(GANDIVA_BC_FILE_NAME irhelpers.bc)
set(GANDIVA_BC_INSTALL_PATH ${GANDIVA_BC_INSTALL_DIR}/${GANDIVA_BC_FILE_NAME})
set(GANDIVA_BC_OUTPUT_PATH ${CMAKE_BINARY_DIR}/${GANDIVA_BC_FILE_NAME})

# Set the path where the so lib file will be installed.
if (APPLE)
set(GANDIVA_HELPER_LIB_FILE_NAME libgandiva_helpers.dylib)
else()
set(GANDIVA_HELPER_LIB_FILE_NAME libgandiva_helpers.so)
endif(APPLE)

set(GANDIVA_HELPER_LIB_INSTALL_PATH ${GANDIVA_BC_INSTALL_DIR}/${GANDIVA_HELPER_LIB_FILE_NAME})
set(GANDIVA_HELPER_LIB_OUTPUT_PATH ${CMAKE_BINARY_DIR}/src/codegen/${GANDIVA_HELPER_LIB_FILE_NAME})

add_subdirectory(codegen)
add_subdirectory(jni)
add_subdirectory(precompiled)
68 changes: 68 additions & 0 deletions cpp/src/gandiva/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Gandiva C++

## System setup

Gandiva uses CMake as a build configuration system. Currently, it supports
out-of-source builds only.

Build Gandiva requires:

* A C++11-enabled compiler. On Linux, gcc 4.8 and higher should be sufficient.
* CMake
* LLVM
* Arrow
* Boost
* Protobuf

On macOS, you can use [Homebrew][1]:

```shell
brew install cmake llvm boost protobuf
```

To install arrow, follow the steps in the [arrow Readme][2].
## Building Gandiva

Debug build :

```shell
git clone https:/dremio/gandiva.git
cd gandiva/cpp
mkdir debug
cd debug
cmake ..
make
ctest
```

Release build :

```shell
git clone https:/dremio/gandiva.git
cd gandiva/cpp
mkdir release
cd release
cmake .. -DCMAKE_BUILD_TYPE=Release
make
ctest
```

## Validating code style

We follow the [google cpp code style][3]. To validate compliance,

```shell
cd debug
make stylecheck
```

## Fixing code style

```shell
cd debug
make stylefix
```

[1]: https://brew.sh/
[2]: https:/apache/arrow/tree/master/cpp
[3]: https://google.github.io/styleguide/cppguide.html
125 changes: 125 additions & 0 deletions cpp/src/gandiva/codegen/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Copyright (C) 2017-2018 Dremio Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

project(gandiva)

# Find arrow
find_package(ARROW)

find_package(Boost COMPONENTS system regex filesystem REQUIRED)

set(BC_FILE_PATH_CC "${CMAKE_CURRENT_BINARY_DIR}/bc_file_path.cc")
configure_file(bc_file_path.cc.in ${BC_FILE_PATH_CC})

# helper files that are shared between libgandiva and libgandiva_helpers
set(SHARED_HELPER_FILES
like_holder.cc
regex_util.cc)

set(SRC_FILES annotator.cc
bitmap_accumulator.cc
configuration.cc
engine.cc
expr_decomposer.cc
expr_validator.cc
expression.cc
expression_registry.cc
filter.cc
function_registry.cc
function_signature.cc
llvm_generator.cc
llvm_types.cc
projector.cc
selection_vector.cc
tree_expr_builder.cc
${SHARED_HELPER_FILES}
${BC_FILE_PATH_CC})

add_library(gandiva_obj_lib OBJECT ${SRC_FILES})

# set PIC so that object library can be included in shared libs.
set_target_properties(gandiva_obj_lib PROPERTIES POSITION_INDEPENDENT_CODE 1)

# For users of gandiva library (including integ tests), include-dir is :
# /usr/**/include dir after install,
# cpp/include during build
# For building gandiva library itself, include-dir (in addition to above) is :
# cpp/src
target_include_directories(gandiva_obj_lib
PUBLIC
$<INSTALL_INTERFACE:include>
$<BUILD_INTERFACE:${CMAKE_SOURCE_DIR}/include>
PRIVATE
${CMAKE_SOURCE_DIR}/src
$<TARGET_PROPERTY:LLVM::LLVM_INTERFACE,INTERFACE_INCLUDE_DIRECTORIES>
$<TARGET_PROPERTY:ARROW::ARROW_SHARED,INTERFACE_INCLUDE_DIRECTORIES>
$<TARGET_PROPERTY:Boost::boost,INTERFACE_INCLUDE_DIRECTORIES>
$<TARGET_PROPERTY:gtest_main,INCLUDE_DIRECTORIES>
)

build_gandiva_lib("shared")

build_gandiva_lib("static")

# install for gandiva
include(GNUInstallDirs)

# install libgandiva
install(
TARGETS gandiva_shared gandiva_static
DESTINATION ${CMAKE_INSTALL_LIBDIR}
)

# install the header files.
install(
DIRECTORY ${CMAKE_SOURCE_DIR}/include/gandiva
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
)

# Pre-compiled .so library for function helpers.
add_library(gandiva_helpers SHARED
${SHARED_HELPER_FILES}
function_holder_stubs.cc)

target_compile_definitions(gandiva_helpers
PRIVATE -DGDV_HELPERS
)

target_include_directories(gandiva_helpers
PRIVATE
${CMAKE_SOURCE_DIR}/include
${CMAKE_SOURCE_DIR}/src
$<TARGET_PROPERTY:ARROW::ARROW_SHARED,INTERFACE_INCLUDE_DIRECTORIES>
)

target_link_libraries(gandiva_helpers PRIVATE Boost::boost)
if (NOT APPLE)
target_link_libraries(gandiva_helpers LINK_PRIVATE -static-libstdc++ -static-libgcc)
endif()

#args: label test-file src-files
add_gandiva_unit_test(bitmap_accumulator_test.cc bitmap_accumulator.cc)
add_gandiva_unit_test(engine_llvm_test.cc engine.cc llvm_types.cc configuration.cc ${BC_FILE_PATH_CC})
add_gandiva_unit_test(function_signature_test.cc function_signature.cc)
add_gandiva_unit_test(function_registry_test.cc function_registry.cc function_signature.cc)
add_gandiva_unit_test(llvm_types_test.cc llvm_types.cc)
add_gandiva_unit_test(llvm_generator_test.cc llvm_generator.cc regex_util.cc engine.cc llvm_types.cc expr_decomposer.cc function_registry.cc annotator.cc bitmap_accumulator.cc configuration.cc function_signature.cc like_holder.cc regex_util.cc ${BC_FILE_PATH_CC})
add_gandiva_unit_test(annotator_test.cc annotator.cc function_signature.cc)
add_gandiva_unit_test(tree_expr_test.cc tree_expr_builder.cc expr_decomposer.cc annotator.cc function_registry.cc function_signature.cc like_holder.cc regex_util.cc)
add_gandiva_unit_test(expr_decomposer_test.cc expr_decomposer.cc tree_expr_builder.cc annotator.cc function_registry.cc function_signature.cc like_holder.cc regex_util.cc)
add_gandiva_unit_test(status_test.cc)
add_gandiva_unit_test(expression_registry_test.cc llvm_types.cc expression_registry.cc function_signature.cc function_registry.cc)
add_gandiva_unit_test(selection_vector_test.cc selection_vector.cc)
add_gandiva_unit_test(lru_cache_test.cc)
add_gandiva_unit_test(like_holder_test.cc like_holder.cc regex_util.cc)
105 changes: 105 additions & 0 deletions cpp/src/gandiva/codegen/annotator.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
// Copyright (C) 2017-2018 Dremio Corporation
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include "codegen/annotator.h"

#include <memory>
#include <string>

#include "codegen/field_descriptor.h"

namespace gandiva {

FieldDescriptorPtr Annotator::CheckAndAddInputFieldDescriptor(FieldPtr field) {
// If the field is already in the map, return the entry.
auto found = in_name_to_desc_.find(field->name());
if (found != in_name_to_desc_.end()) {
return found->second;
}

auto desc = MakeDesc(field);
in_name_to_desc_[field->name()] = desc;
return desc;
}

FieldDescriptorPtr Annotator::AddOutputFieldDescriptor(FieldPtr field) {
auto desc = MakeDesc(field);
out_descs_.push_back(desc);
return desc;
}

FieldDescriptorPtr Annotator::MakeDesc(FieldPtr field) {
// TODO:
// - validity is optional
int data_idx = buffer_count_++;
int validity_idx = buffer_count_++;
int offsets_idx = FieldDescriptor::kInvalidIdx;
if (arrow::is_binary_like(field->type()->id())) {
offsets_idx = buffer_count_++;
}
return std::make_shared<FieldDescriptor>(field, data_idx, validity_idx, offsets_idx);
}

void Annotator::PrepareBuffersForField(const FieldDescriptor &desc,
const arrow::ArrayData &array_data,
EvalBatch *eval_batch) {
int buffer_idx = 0;

// TODO:
// - validity is optional

uint8_t *validity_buf = const_cast<uint8_t *>(array_data.buffers[buffer_idx]->data());
eval_batch->SetBuffer(desc.validity_idx(), validity_buf);
++buffer_idx;

if (desc.HasOffsetsIdx()) {
uint8_t *offsets_buf = const_cast<uint8_t *>(array_data.buffers[buffer_idx]->data());
eval_batch->SetBuffer(desc.offsets_idx(), offsets_buf);
++buffer_idx;
}

uint8_t *data_buf = const_cast<uint8_t *>(array_data.buffers[buffer_idx]->data());
eval_batch->SetBuffer(desc.data_idx(), data_buf);
++buffer_idx;
}

EvalBatchPtr Annotator::PrepareEvalBatch(const arrow::RecordBatch &record_batch,
const ArrayDataVector &out_vector) {
EvalBatchPtr eval_batch = std::make_shared<EvalBatch>(
record_batch.num_rows(), buffer_count_, local_bitmap_count_);

// Fill in the entries for the input fields.
for (int i = 0; i < record_batch.num_columns(); ++i) {
const std::string &name = record_batch.column_name(i);
auto found = in_name_to_desc_.find(name);
if (found == in_name_to_desc_.end()) {
// skip columns not involved in the expression.
continue;
}

PrepareBuffersForField(*(found->second), *(record_batch.column(i))->data(),
eval_batch.get());
}

// Fill in the entries for the output fields.
int idx = 0;
for (auto &arraydata : out_vector) {
const FieldDescriptorPtr &desc = out_descs_.at(idx);
PrepareBuffersForField(*desc, *arraydata, eval_batch.get());
++idx;
}
return eval_batch;
}

} // namespace gandiva
Loading