Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-854] Cache authentication (token) for subsequent connections #201

Closed
joshuataylor opened this issue Jul 18, 2022 · 6 comments · Fixed by #428
Closed

[CT-854] Cache authentication (token) for subsequent connections #201

joshuataylor opened this issue Jul 18, 2022 · 6 comments · Fixed by #428
Labels
enhancement New feature or request help_wanted Extra attention is needed

Comments

@joshuataylor
Copy link
Contributor

Describe the bug

With Snowflake, logging in is slow (as it should be, given password hashing etc), so it should be avoided.

Every query that is executed causes a new login, as the connection is marked as closed. This should not happen, the connection should be marked as open so you're not forced to login again.

Steps To Reproduce

  1. Have multiple queries, or set the threads to a low limit
  2. Issue the queries
  3. See multiple logins, you can verify by:
select *
from table(information_schema.login_history_by_user())
order by event_timestamp desc;

Expected behavior

A single login happens, maybe by thread?

System information

Happens on MacOS/Linux, pretty sure it's not a OS issue..
Python 3.10.4, dbt 1.1.1

@joshuataylor joshuataylor added bug Something isn't working triage labels Jul 18, 2022
@github-actions github-actions bot changed the title Connection is closed after a query causing a login per query [CT-854] Connection is closed after a query causing a login per query Jul 18, 2022
@jtcohen6 jtcohen6 removed the triage label Jul 19, 2022
@jtcohen6
Copy link
Contributor

Thanks @joshuataylor!

I left a fuller response over in dbt-labs/dbt-core#5489 (comment)

I'm going to close this issue, for now, in favor of centralizing the conversation over there

@jtcohen6
Copy link
Contributor

Reopening as there might be a dbt-snowflake only way to go here, or at least a thread worth pulling on:

@joshuataylor
As another alternative, if we could solve this on the dbt-snowflake level in the interim this would be a big speed win. We could add the token when logged in to the connection, then add it to connection. This would involve updating the connection contract though, maybe adding a metadata or other key that connectors could use?

@jtcohen6
I don't have a good sense of whether there are any potential security risks with taking that approach. If it works, though, and substantially speeds up the process of opening new connections, that our current dbt-core approach (treating connections as a commodity) might hold muster for the foreseeable future.

@jtcohen6 jtcohen6 reopened this Jul 19, 2022
@jtcohen6 jtcohen6 added the help_wanted Extra attention is needed label Jul 19, 2022
@joshuataylor
Copy link
Contributor Author

It shouldn't be a security risk, as the connection details are passed in anyway when doing an open. So it should be fine.

My proof of concept was this, where we can check that:

  1. The connection state is closed (as there is no point doing it for any other state)
  2. We have the handle (we should really check that it's the snowflake handle)
  3. We have the token inside of the handle (via handle.rest.token). This is exposed via Snowflake, but I'm unsure how to check that token is still valid, as the token is only valid for a certain amount of time, I think ~4 hours by default.
        if connection.state == "closed" and connection.handle and connection.handle.rest.token:
            return connection

However this throws a RecursionError as we're trying to access the handle via the open. I think if we could alter the credentials with a token, that would work, but I'm unsure if that's threadsafe?

@leahwicz leahwicz added enhancement New feature or request and removed bug Something isn't working labels Jul 19, 2022
@jtcohen6 jtcohen6 changed the title [CT-854] Connection is closed after a query causing a login per query [CT-854] Cache authentication (token) for subsequent connections Jul 19, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 19, 2022

the token is only valid for a certain amount of time, I think ~4 hours by default

This is a legitimate concern. We'd need some way to detect that the token is invalid — probably by trying to open a connection with it, catching the error, and falling back to re-authenticate.

However this throws a RecursionError as we're trying to access the handle via the open. I think if we could alter the credentials with a token, that would work, but I'm unsure if that's threadsafe?

It does seem like there'd be a risk around thread safety. The only object not reused / shared between threads is the connection itself. Could we use the Connection Manager's self.lock? (see here) This seems to be our pattern when getting / checking / clearing connections.

@joshuataylor
Copy link
Contributor Author

I've got a PR here which should solve the issue safely, as it won't release it: #203

This doesn't change open/close logic so you shouldn't see any problems around that.

@yingqiaozheng
Copy link

I ran into the same problem when I install dbt via brew. I uninstall dbt via brew then pip install from scratch , the problem is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Extra attention is needed
Projects
None yet
4 participants