Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use psycopg2-binary dep for Postgres and Redshift when v2.8 comes out #1221

Closed
drewbanin opened this issue Jan 7, 2019 · 8 comments · Fixed by #1898
Closed

Use psycopg2-binary dep for Postgres and Redshift when v2.8 comes out #1221

drewbanin opened this issue Jan 7, 2019 · 8 comments · Fixed by #1898
Labels
dependencies Changes to the version of dbt dependencies redshift

Comments

@drewbanin
Copy link
Contributor

The psycopg2 dep currently ships as a wheel, but version 2.8 will revert psycopg2 back to being source-only. This won't be a problem for dbt, as we've set the upper bound of psycopg2 to 2.8.

When the 2.8 release eventually comes out, we should upgrade and switch dbt to use the psycopg2-binary dep instead.

Information about releases here.
And info about the binary release here.

@drewbanin drewbanin added redshift dependencies Changes to the version of dbt dependencies labels Jan 7, 2019
@drewbanin drewbanin added this to the 0.14.1 milestone Jul 15, 2019
@drewbanin
Copy link
Contributor Author

drewbanin commented Jul 15, 2019

Tentatively including this in the 0.14.1 milestone, but it may make sense to push this to a minor release since the upgrade path might be non-trivial.

It sounds like we'll need to run uninstall psycopg2 and then install psycopg2-binary -- does that mean that installation with pip will require the -I flag for upgrading users?

Last: it sounds like psycopg2-binary isn't intended for use in libraries. Let's figure out if using the -binary wheel is indeed a good idea for us to use here, and if so, what (if any) caveats exist.

@beckjake
Copy link
Contributor

I've done far more reading on this than I'd like, and what I think I want to do is require psycopg2-binary by default on all platforms, but allow users to override their install to use the non-binary form.

My reasoning is that on macos and windows especially, building psycopg2 locally requires a bunch of headers that the average user is realistically just never going to install. Since so far the wheels have been working fine for us, I don't think it's very reasonable to change that and require you to have a compiler toolchain + libpq headers.

On the other hand, on Linux it's apparently reasonably likely to get mystery segfaults when openssl versions change or whatever. I've never experienced that on our docker images but I'm willing to believe it - this poor guy wouldn't have gone through all this nonsense unless he had a compelling reason! So we probably need to at least provide some way to let those users get psycopg2 installed.

I'm not sure exactly how to do this but I do know that setup.py provides a few ways to pass flags along. The real painful part is that we have to pass those arguments through to our plugins, if provided. I think the easiest thing to do is probably to just use environment variables to decide if we should dynamically set the install_requires used in setup(). Something like this in plugins/postgres/setup.py:

install_requires=[
   '{} >=2.8.2, <2.9'.format(os.getenv('DBT_PSYCOPG2_PACKAGE', 'psycopg2-binary')),
  ...
]

Then if you know better, you can always set DBT_PSYCOPG2_PACKAGE="psycopg2" before you pip install.

There are two big downsides here:

  1. The upgrade path is awful. This is pip's fault, basically - it provides no meaningful facility to note that two pypi packages provide the same package. Maybe we can do some install-time shenanigans to decide if psycopg2 already exists, and if so check its version and either uninstall it (if it's psycopg2) or upgrade it (if it's psychopg2-binary). I'm sure pip folks will hate that, but they made this mess so I have no sympathy.

  2. If you have another package in your same virtualenv that requires the opposite package as the one you've selected, things will get very weird! This is fundamentally related to problem 1, but it's more of a user-level issue. Caveat emptor, I guess? Use a fresh virtualenv for dbt or make sure you set the environment variable to perfectly match the existing install!

The way I see it we don't have much choice but to default to psycopg2-binary - getting your average mac/windows user to install libpq development headers and a compiler toolchain is not a realistic option, and staying on 2.7.x obviously isn't an option on Windows given the concurrency issues w/OpenSSL 1.0.

@drewbanin
Copy link
Contributor Author

@beckjake do you think we should do this for 0.14.1? Or should we hold off for 0.15.0?

@beckjake
Copy link
Contributor

I feel pretty comfortable with it on 0.14.1 if we feel good about the pain of changing dependencies on a minor release. I've been using it locally since this issue got written and everything seems fine.

@drewbanin
Copy link
Contributor Author

Ok - I recall you mentioning that pip might do something unfortunate if users try to --upgrade their dbt version. Am I misremembering, or do you think special care will need to be taken when upgrading to 0.14.1 from a previous version?

@beckjake
Copy link
Contributor

We can test it out, but I would guess upgrades end in users who have both the old psycopg2 and the new psycopg2-binary, and a race condition over which path wins during import.

@drewbanin
Copy link
Contributor Author

@beckjake I think I want to kick this out of the 0.14.1 milestone. I think your ideas here are good ones, but I have so little faith in pip doing the right for us thing here, and it would be really unfortunate if we bricked a lot of dbt installations in weird ways in a bugfix/patch release.

Let's re-prioritize this for 0.15.0 -- that should give us some time to explore (and test) the approaches you've outlined in your comment above.

@drewbanin drewbanin modified the milestones: 0.14.1, Louisa May Alcott Aug 5, 2019
@drewbanin drewbanin removed this from the Louisa May Alcott milestone Aug 27, 2019
@beckjake
Copy link
Contributor

beckjake commented Nov 7, 2019

Ok, further progress on this:

  • psycopg2~=2.7 causes random crashes/segfaults for some people
  • psycopg2~=2.8 fixes this
  • psycopg2~=2.8 has binary wheels for many things, but not for any python 3.8, so people installing the latest python are told they need a compiler. Very annoying!
  • psycopg2-binary~=2.8 has binary wheels for even more versions, but there's a reason the developer switched to psycopg2: the segfaults!

The current plan is that we're going to install psycopg2-binary on python 3.8 and psycopg2 otherwise. In case this causes problems, we'll support an environment variable you can use to set your psycopg2 package name at install time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Changes to the version of dbt dependencies redshift
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants