Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection retry refactor, add defaults with exponential backoff #207

Merged
merged 5 commits into from
Aug 30, 2022

Conversation

nathaniel-may
Copy link
Contributor

@nathaniel-may nathaniel-may commented Jul 20, 2022

resolves #205

  • Refactors retry logic to use retry_connection from core.
  • Preserves existing behavior,
  • Adds consistent defaults with other adpaters:
    • if connect_retries and connect_timeout are not specified: 1 retry after 1s
    • if connect_retries is specified but connect_timeout is not, it will use exponential backoff

This is a non-trivial change so I'll wait for sign off from @jtcohen6 before bothering to add the tests.

TODO

  • add similar tests from postgres PR <- those are testing the library function in core so we're covered here too.

@cla-bot cla-bot bot added the cla:yes label Jul 20, 2022
@nathaniel-may nathaniel-may self-assigned this Jul 20, 2022
@nathaniel-may nathaniel-may marked this pull request as draft July 20, 2022 18:42
@McKnight-42 McKnight-42 self-requested a review July 20, 2022 21:37
@@ -59,8 +59,8 @@ class SnowflakeCredentials(Credentials):
proxy_host: Optional[str] = None
proxy_port: Optional[int] = None
protocol: Optional[str] = None
connect_retries: int = 0
connect_timeout: int = 10
connect_retries: int = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Also discussed: using connect_retries as the standard name everywhere (including Postgres/Redshift)

dbt/adapters/snowflake/connections.py Outdated Show resolved Hide resolved
Comment on lines 293 to 294
elif creds.retry_on_database_errors:
retryable_exceptions = [snowflake.connector.errors.DatabaseError]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO let's use retry_on_database_errors: True as our default. Users can optionally turn if off, or they can opt into all errors via retry_all.

(@nathaniel-may may also be inspired to dig further into Snowflake docs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found more sensible defaults by looking at the connector source.

@nathaniel-may nathaniel-may marked this pull request as ready for review August 4, 2022 19:08
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2022

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-core contributing guide, and the dbt-snowflake contriubuting guide.

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Aug 5, 2022

Hey @nathaniel-may whilst you folks are working on this one... does this sync up the connect_timeout config of dbt-snowflake with all the likes of redshift/postgress adapter connect_timeout?

We recently put in a default of connect_timeout: 30 in dbt Cloud's snowflake profile (https://dbtlabs.atlassian.net/browse/SHP-588) since this config is unavailable to end users. When looking at some of the logs once this was deployed, we realized that this doesn't behave the same way as the other adapter's config (and then I double checked the docs1 which does clearly state the behaviour here). As someone familiar with other adapters connect_timeout config, we automatically thought this config would behave the exact samesies.

This config for Snowflake's behaviour is:

If a connection terminates, wait connect_timeout seconds before retrying the connection.

As opposed to Redshift/PG's:

While connecting, wait connect_timeout seconds before dropping the connection.

Footnotes

  1. https://docs.getdbt.com/reference/warehouse-profiles/snowflake-profile#retry_on_database_errors

@nathaniel-may
Copy link
Contributor Author

nathaniel-may commented Aug 5, 2022

Hey @jeremyyeo! This is part of a larger effort to keep timeout config behavior to roughly the same standard:

The goal:

  • default to one retry after 1s for retryable errors
  • if the user configures a number of retries, use exponential backoff for retryable errors

Exceptions:

  • Snowflake already had additional retry logic so this implementation is backwards compatible for users who have that already configured. To be specific- if the user has connect_timeout configured, retries will use that instead of exponential backoff.
  • Postgres and Redshift will attempt to retry some non-retryable errors as well due to a psycopg2 limitation.
  • Spark retries will remain unchanged and continue to attempt to retry ALL errors without exponential backoff. This is due to a spark driver limitation that doesn't allow us to easily reuse the library function for retries in core.
  • BQ will remain unconfigurable. It will use its own exponential backoff strategy for only retryable errors as implemented in the gcp @Retry decorator. The behavior is roughly equivalent to configuring connect_retires = 6 on our other adapters because the default for bq is to stop retrying after a total of 120s but it will actually retry 10 times.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Excited to get this merged. Sorry for the delay with final review

Comment on lines +312 to +316
# these two options are for backwards compatibility
if creds.retry_all:
retryable_exceptions = [Error]
elif creds.retry_on_database_errors:
retryable_exceptions.insert(0, DatabaseError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@leahwicz
Copy link
Contributor

@nathaniel-may can we merge this in for the end of the sprint and call it done?

@nathaniel-may nathaniel-may merged commit fdb5c40 into main Aug 30, 2022
@nathaniel-may nathaniel-may deleted the nate/connection-retries branch August 30, 2022 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-874] snowflake: standardize connection retries.
4 participants