Add support for ODBC Server Side Parameters #201

jethron · 2021-08-05T05:31:24Z

Description

Allow passing through Cluster Configuration parameters via profiles.yml when using the odbc connection method.
This allows configuration of values that would need to be set via SET to be run in the Spark session and take effect on DBT models.

E.g.:

spark_dbt:
  target: default
  outputs:
    default:
      type: spark
      method: odbc
      driver: /opt/simba/spark/lib/64/libsparkodbc_sb64.so
      schema: default
      host: [hostid]
      token: [token]
      cluster: [clusterid]
      port: 443
      connect_retries: 3
      connect_timeout: 20
      server_side_parameters:
          "spark.databricks.delta.schema.autoMerge.enabled": True
          "spark.sql.jsonGenerator.ignoreNullFields": True

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6

Thanks for the PR @jethron! This seems like a clever way of managing a gnarly configuration process. Have you been able to test this locally, to confirm that the configuration supplied in the ODBC string takes effect?

jethron · 2021-08-10T01:42:26Z

Hey @jtcohen6, I have.

With this patch I've successfully used the spark.databricks.delta.schema.autoMerge.enabled setting above to use Delta Lake Automatic Schema Evolution to update a table schema with a MERGE statement from an incremental model:

Without this patch the MERGE fails because columns are added/missing between source/target table during the merge insert.
With the patch and above configuration, the MERGE succeeds, rows are inserted, and the schema evolves as expected.

I find specifying it here in profiles.yml much neater than the platform-specific instructions for adding it to your ODBC driver configuration, much easier to document for other users of the models.

jtcohen6 · 2021-08-10T01:47:22Z

That's great to hear, and you named exactly the setting for which I imagined this might prove especially useful.

I'm going to update the Changelog on the master branch now. After I do, could you pull master and add a changelog entry? Then we can get this in for v0.21!

…sp_support

jethron · 2021-08-11T00:37:13Z

Sorry for the delay, didn't see the edit. Done!

jtcohen6

Thanks for the contribution @jethron!

* Add support for ODBC Server Side Parameters * Update CHANGELOG

* Show more detailed feedback when pyodbc import fails (#192) * Use exception chaining to get more detailed feedback when pyodbc is not installed * Remove pyodbc referenced before assignment * Set back try except * Add flake ignore * Add error message to RunTimeException Error chaining does not show the message in `dbt debug`. Therefore we explicitly add the message to the dbt.exceptions.RunTimeException * Update change log Add to change log that we print the import error when pyodbc can not be imported * Fix parenthesis in change log * Update changelog [skip ci] * Update changelog [skip ci] * Update changelog [skip ci] * Add support for ODBC Server Side Parameters (#201) * Add support for ODBC Server Side Parameters * Update CHANGELOG * Feature/able to retry all connections (#194) * Code changes * README changes * Improve error message default * Changelog * Changelog corrections * Restore accidental deletion * Update dbt/adapters/spark/connections.py Co-authored-by: Jeremy Cohen <[email protected]> * Add myself to Contributors Co-authored-by: Jeremy Cohen <[email protected]> * fixed get_columns_in_relation for open source delta table (#207) * fixed get_columns_in_relation for open source delta table * fixed E501 linting error and added change log * fix issue parsing structs (#204) * fix issue parsing structs * include contributor in changelog * better error explanation Co-authored-by: Jeremy Cohen <[email protected]> * Add adapter unique_field (#211) * Add adapter unique_field * Fix flake8. Add changelog entry * [Snyk] Fix for 2 vulnerabilities (#214) * fix: requirements.txt to reduce vulnerabilities The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-SQLPARSE-1584201 - https://snyk.io/vuln/SNYK-PYTHON-THRIFT-474615 * Removing Thrift conflict with versions over 12 Co-authored-by: leahwicz <[email protected]> * 0.21.0b1 Release * Bumping version to 0.21.0b1 * Bumping to 0.21.0b2 * Bumping version to 0.21.0b2 Co-authored-by: Cor <[email protected]> Co-authored-by: Jeremy Cohen <[email protected]> Co-authored-by: Jethro Nederhof <[email protected]> Co-authored-by: gregingenii <[email protected]> Co-authored-by: Jeremy Cohen <[email protected]> Co-authored-by: Hariharan Banukumar <[email protected]> Co-authored-by: Sergio <[email protected]> Co-authored-by: Snyk bot <[email protected]> Co-authored-by: Ian Knox <[email protected]> Co-authored-by: Leah Antkiewicz <[email protected]>

Add support for ODBC Server Side Parameters

db0db96

cla-bot bot added the cla:yes label Aug 5, 2021

jtcohen6 reviewed Aug 10, 2021

View reviewed changes

jethron added 2 commits August 11, 2021 10:31

Merge branch 'master' of https:/dbt-labs/dbt-spark into s…

4eef2c0

…sp_support

Update CHANGELOG

f6f7721

jtcohen6 approved these changes Aug 11, 2021

View reviewed changes

jtcohen6 merged commit 3a66285 into dbt-labs:master Aug 11, 2021

jtcohen6 pushed a commit that referenced this pull request Aug 11, 2021

Add support for ODBC Server Side Parameters (#201)

d09074f

* Add support for ODBC Server Side Parameters * Update CHANGELOG

This was referenced Oct 14, 2021

Incremental models do not fail when schema changes #226

Closed

Schema evolution for databricks delta is not working using pre_hook #217

Closed

jtcohen6 mentioned this pull request Nov 16, 2021

Model pre_hook is not executed on ODBC, but it is for HTTP #256

Closed

jtcohen6 mentioned this pull request Apr 28, 2022

Update Apache Spark profile configs dbt-labs/docs.getdbt.com#1391

Merged

VShkaberda mentioned this pull request Jul 7, 2022

[CT-821] Allow passing through Cluster Configuration parameters via profiles.yml when using the thrift connection method #387

Closed

akashrn5 mentioned this pull request Nov 29, 2022

[CT-1561] [Feature] Support defining server_side_parameters inside of dbt_project.yml #529

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for ODBC Server Side Parameters #201

Add support for ODBC Server Side Parameters #201

jethron commented Aug 5, 2021

jtcohen6 left a comment

jethron commented Aug 10, 2021

jtcohen6 commented Aug 10, 2021 •

edited

Loading

jethron commented Aug 11, 2021

jtcohen6 left a comment

Add support for ODBC Server Side Parameters #201

Add support for ODBC Server Side Parameters #201

Conversation

jethron commented Aug 5, 2021

Description

Checklist

jtcohen6 left a comment

Choose a reason for hiding this comment

jethron commented Aug 10, 2021

jtcohen6 commented Aug 10, 2021 • edited Loading

jethron commented Aug 11, 2021

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 commented Aug 10, 2021 •

edited

Loading