Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dremio: Unified Analytics Platform #136

Open
amotl opened this issue Jul 17, 2024 · 10 comments
Open

Dremio: Unified Analytics Platform #136

amotl opened this issue Jul 17, 2024 · 10 comments

Comments

@amotl
Copy link
Member

amotl commented Jul 17, 2024

About

OSS

Dremio - the missing link in modern data.
Dremio enables organizations to unlock the value of their data.

Commercial

The Unified Lakehouse Platform for Self-Service Analytics and AI.

Dremio provides the fastest SQL engine with the best price-performance for Apache Iceberg, an Apache Iceberg catalog and Lakehouse Management service for next-gen dataops, and hybrid cloud deployment flexibility.

References

/cc @hlcianfagna, @karynzv, @hammerhead

@amotl amotl changed the title Dremio: Unified Lakehouse Platform for Self-Service Analytics and AI Dremio: A modern data platform Jul 17, 2024
@amotl amotl changed the title Dremio: A modern data platform Dremio: Unified Analytics Platform Jul 17, 2024
@amotl
Copy link
Member Author

amotl commented Jul 17, 2024

@amotl
Copy link
Member Author

amotl commented Aug 1, 2024

Problem

When trying to build https:/dremio/dremio-oss, this error is raised:

mvn clean install -DskipTests
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.12.1:compile (default-compile)
        on project errorprone-dremio: Compilation failure
[ERROR] [options] system modules path not set in conjunction with -source 11

Details

$ mvn --version
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /usr/local/Cellar/maven/3.8.6/libexec
Java version: 19.0.1, vendor: Homebrew, runtime: /usr/local/Cellar/openjdk/19.0.1/libexec/openjdk.jdk/Contents/Home
Default locale: en_GB, platform encoding: UTF-8
OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"

Pretty old, but also happens on Maven 3.9.8, according to @karynzv.

Version Spec

Wondering if my software versions would be too recent or too old, it is not the case.

<requireMavenVersion>
  <version>[3.3.9,4)</version>
</requireMavenVersion>
<requireJavaVersion>
  <version>[11,)</version>
</requireJavaVersion>

@amotl
Copy link
Member Author

amotl commented Aug 1, 2024

Problem

[ERROR] [options] system modules path not set in conjunction with -source 11

Solution

Adding -Derrorprone.skip makes the build progress further. At BUILD FAILURE with -Ddremio.oss-only, we also picked up two more build options.

mvn clean install -DskipTests -Derrorprone.skip -Ddremio.oss-only=true -Dlicense.skip=true -e

@amotl

This comment was marked as duplicate.

@amotl
Copy link
Member Author

amotl commented Aug 2, 2024

Problem

[ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.43.0:check (spotless-check) 
on project dremio-sabot-kernel: 
Unable to check file /Users/amo/dev/foss/dremio-oss/sabot/kernel/src/main/java/com/dremio/sabot/op/join/merge/MergeJoinComparatorTemplate.java: 
com.google.googlejavaformat.java.FormatterException: 215:10: error: invalid use of a restricted identifier 'yield'

Solution

Pending.

@amotl
Copy link
Member Author

amotl commented Aug 2, 2024

Next

@karynzv started to look into skipping the Dremio build, use its OCI image instead, trying to build and pluck the CrateDB connector into it. Thanks!

Problem

When trying to build the connector, it also fails. It is probably using an outdated API.

[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1073,96] cannot find symbol
  symbol:   method getNessieTreeApiBlockingStub()
  location: class com.dremio.exec.server.SabotContext
[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1073,42] cannot find symbol
  symbol:   method getNessieContentsApiBlockingStub()
  location: class com.dremio.exec.server.SabotContext
[ERROR] /path/to/cratedb-dremio-connector/src/test/java/com/dremio/BaseTestQuery2.java:[1079,55] incompatible types: org.apache.hadoop.conf.Configuration cannot be converted to com.dremio.exec.store.iceberg.SupportsIcebergMutablePlugin

@karynzv
Copy link

karynzv commented Aug 9, 2024

Let me document what I've tested when trying to connect with Dremio:

  • Community connector (https:/rongfengliang/cratedb-dremio-connector)
    1 - Run a Dremio docker as documented here (https://docs.dremio.com/current/get-started/docker-quickstart/)
    2 - Build the community connector but removed the problematic test files in src/test/java/com/dremio/ , I personally haven't tried to fix it (https:/rongfengliang/cratedb-dremio-connector)
    3 - Move the resulting .jar file to the docker jars/3rdparty folder and restart docker as described here
    4 - Add a new source now choosing the CRATEDB option and configure with your cluster info access
    5 - In Dremio, query a VIEW as SELECT * FROM VIEW_NAME, which will give an error
    6 - Check in CrateDB for the queries run by Dremio with SELECT * from sys.jobs_log WHERE username = <DREMIO_USER>

  • The default Postgres connector:
    1 - Run a Dremio docker as documented here (https://docs.dremio.com/current/get-started/docker-quickstart/)
    2 - Add a new source choosing Postgres and configure accordingly.
    3 - Try querying the data using the connector reference, due to the use of COLLATE the queries will fail.
    4 - Instead, use the approach described here and you should be able to query CrateDB directly

@karynzv
Copy link

karynzv commented Aug 9, 2024

I did some further tests and this seems to be the recommended approach to use Dremio with CrateDB.

Instead of using a specific connector, there is the option to query CrateDB directly from Dremio as documented here. So, by using the default Postgres connector as explained above, use the following syntax to query CrateDB directly:

SELECT * FROM table(crate.external_query('SELECT o[''it''] FROM doc.test_view;'))

Further details on the syntax and use here

@amotl
Copy link
Member Author

amotl commented Aug 9, 2024

Hi. Thanks a stack for your reports, both how to set up a development sandbox for the community connector, and for educating us that the external queries connector works well.

  1. Shall we document this fact by adding an item about Dremio to crate-clients-tools and cratedb-guide?

  2. What's next?

    These external queries, so called because they are passed by and run outside of Dremio.

    Is it still applicable to continue working on fixing the native connector for CrateDB, because this is one major detail what Dremio is about, running the queries inside Dremio's core engine, and not by-passing it, in order to combine multiple data sources by using its federation layer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants