Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-local connections #1312

Closed
beckjake opened this issue Feb 20, 2019 · 2 comments
Closed

Thread-local connections #1312

beckjake opened this issue Feb 20, 2019 · 2 comments

Comments

@beckjake
Copy link
Contributor

Feature

We should store connections on a singleton-per-thread basis, I assume using thread-local storage but it's not really important how.

Feature description

Currently, dbt's connection/thread management works like this:

  • we ensure that each model is run in a thread
  • those threads get the name of their connection as a parameter to most methods, etc - either as a model_name parameter, handled by DBWrapper, as the model.alias attribute, or in special cases as the connection_name parameter.
  • we store connection objects in a map where the keys are the model name that's being executed
  • individual runners (one per thread!) manage connection state based on the connection name in use
  • connections are potentially re-used in kind of complicated ways involving passing them around and renaming them (and potentially, re-homing them into new threads).

So threads are supposed to have one connection per thread, and that's managed by having the threads keep track of a managed name and carefully passing it along everywhere, using that name to open/close connections, etc.

My proposal is to instead formally bind connection objects to threads, and then have threads manage the open/close/transaction state of "their connection" in a more model-agnostic way. Each thread owns a connection stored in thread-local storage (or a dict mapping thread name -> connection, if thread-local storage is hard to get right - not important as long as it's a per-thread connection). No more names, no more "in use connections" storage, etc.

Who will this benefit?

This will help resolve our numerous connection-leaking related issues, and I think also make transaction management a lot simpler. Note that the latter will always be hard because we have to keep dbt's idea of transaction state in sync with the remote system's, and that's tricky.

@beckjake
Copy link
Contributor Author

beckjake commented Mar 4, 2019

update/note after thinking about this a bit and poking around: thread-local storage isn't great for this problem, as we do need to access child thread connection objects from the main thread in order to do cancellation and cleanup. So it'll have to be a dict of thread IDs/names -> connections. I think the dict method also handles the test/hook running case more nicely and is less likely to have cross-platform weirdness, so it's probably for the best

@beckjake
Copy link
Contributor Author

beckjake commented Mar 8, 2019

Fixed in #1336

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant