Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old libraries in setup.py causing dependency resolution to pull old transitive constraints (3 years+) #12120

Closed
mik-laj opened this issue Nov 5, 2020 · 16 comments
Labels
area:dependencies Issues related to dependencies problems kind:bug This is a clearly a bug security Security issues that must be fixed

Comments

@mik-laj
Copy link
Member

mik-laj commented Nov 5, 2020

Dear and Wonderful Citizens,

I started to look at what libraries we have defined in the constraints-*.txt file and I am a bit surprised because we have this constraints defined on very old libraries.
https:/apache/airflow/blob/053afe7/constraints-3.8.txt

Update (@potiuk): -> Just for clarity: constraints are automatically generated from setup.py so this is a matter of dependencies defined there. If we are to fix it, we will have to upgrade dependencies defined in setup.py NOT the constraints themselves.

Sometimes we have defined libraries that are over 3 years old, which can cause security problems. Old versions of the library may have vulnerabilities that have probably been fixed in newer versions.

I am most concerned about dependency conflicts. Old libraries are only compatible with old libraries, which can cause problems if the user wants to use a new version of the same library.

I think it's worth investigating where these limitations come from and why we can't use newer versions of these libraries.

You can see the list of libraries that need updating in the Jupyter interactive notebook.
https://colab.research.google.com/drive/1F5Lw8qNcxCvWaYUrGZ1x3W3v3080Dq0U#scrollTo=AfIBqzjo8UId

CC: @potiuk @ryw

@mik-laj mik-laj added kind:bug This is a clearly a bug security Security issues that must be fixed labels Nov 5, 2020
@mik-laj mik-laj changed the title Very old libraries in constraints.txt Very old libraries in constraints.txt (3 years+) Nov 5, 2020
@potiuk
Copy link
Member

potiuk commented Nov 6, 2020

I will change the title of the issue, because it is not the matter of constraints, but it's the matter of limitations in setup.py and transitional dependencies that we have for all those libraries.

There is nothing we can do on the constraint level. The constraints files are generated automatically by PIP dependency mechanisms - whatever PIP resolves using setup.py limits is automatically updated as new version of constraints.

So if we want to do anything about it (do we?) we should remove some of the limitations there and upgrade all the different providers/core dependencies we have to use the latest version of dependent libraries - basically for each provider separately. Even that will not help in some cases because the newest version of those libraries might transitively use some older versions of dependent libraries.

Does anyone have some proposal there? Should we somehow make an effort to upgrade those? Maybe there is someone who would like o lead that?

Note, that in order to do it, we likely need to have some system tests in place and implemented for those providers that we decide to bump to later version of dependent libraries (https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Automation+of+System+Tests+for+external+systems) because what we basically need to do is to upgrade the libraries that we already know usually that they have some compatibility issues (for example all the google providers will have to be sooner or later migrated to >2.0.0 python libraries and automated unit testing is not enough for those kinds of changes (those libraries are not backwards compatible).

WDYT others? Is it worth to make such a concerted effort? Are there some real benefits from that? Should we do it? For 2.0 or later?

@potiuk potiuk changed the title Very old libraries in constraints.txt (3 years+) Old libraries in setup.py causing dependency resolution to pull old transitive constraints (3 years+) Nov 6, 2020
@potiuk
Copy link
Member

potiuk commented Nov 6, 2020

BTW. If somebody want to see where the deps come from, it's easy to use pipdeptree:

pipdeptree.3.8.txt

@kaxil
Copy link
Member

kaxil commented Nov 6, 2020

I agree with Jarek, that some of the constraints come from dependencies of Airflow's dependencies. And since we have lots of dependencies it is not always straightforward to support all versions as these dependencies define there own requirements of versions.

So while having latest and greatest would be awesome, we need to make a concise effort that we don't make Airflow incompatible with other dependencies since PIP resolver is not going to allow it any longer

@pvcnt
Copy link

pvcnt commented Nov 26, 2020

There is even a conflict in Airflow 1.10.13 with importlib-metadata, because Airflow requires importlib-metadata~=2.0, while argcomplete (required at 1.12.0 by Airflow) requires importlib-metadata<2,>=0.23.

@turbaszek
Copy link
Member

turbaszek commented Nov 26, 2020

Related #12508 #12636 #12635

@turbaszek turbaszek added the area:dependencies Issues related to dependencies problems label Nov 26, 2020
@potiuk
Copy link
Member

potiuk commented Nov 26, 2020

I don't think we'd want to fix deps for 1.10.* Our focus in #12636 is to make them fixed (and non-breakable in the future) for Airflow 2.0.

@HaloKo4
Copy link

HaloKo4 commented Dec 20, 2020

Where airflow even use dnspython ?
I see it as dependency for mongo provider but I don't see where this package is actually being used in the code.

@mik-laj
Copy link
Member Author

mik-laj commented Dec 20, 2020

Airflow doesn't use this dependency directly, but other dependencies that require Airflow do require this library. Python doesn't have dependency isolation (for example, the node has), so we need to make sure other libraries don't restrict their dependency versions too much, otherwise users won't be able to use their favorite libraries in the latest versions. Users will also not be able to get security patches if the version restrictions are too restrictive.

pipdeptree -r -p dnspython

dnspython==1.16.0
  - email-validator==1.1.2 [requires: dnspython>=1.15.0]
    - Flask-AppBuilder==3.1.1 [requires: email-validator>=1.0.5,<2]
      - apache-airflow==2.0.0 [requires: flask-appbuilder~=3.1.1]
  - eventlet==0.29.1 [requires: dnspython>=1.15.0,<2.0.0]

In this case, it looks like we need to check why eventlet requires a version older than 2.0.

@HaloKo4
Copy link

HaloKo4 commented Dec 20, 2020

but this dependency exist only on mongo provider. what am I missing here?
https:/apache/airflow/blob/master/setup.py#L321

@mik-laj
Copy link
Member Author

mik-laj commented Dec 20, 2020

@HaloKo4 We generate one constraints file to check compatibility with all libraries and detect all conflicts, so even if mongo only requires an older version, most users will have the old version of this library.

Here is the official Airflow installation guide, which installs libraries always the latest tested versions.
http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow/latest/installation.html

If you try to install Airflow and dnspython according to the instructions, the latest version of the dnspython library will not be installed.

AIRFLOW_VERSION=master
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install \
    --constraint "${CONSTRAINT_URL}" \
    ".[postgres,google]" \
    dnspython

I found a ticket in the eventlet project and I hope that it will be solved.
eventlet/eventlet#619

@potiuk
Copy link
Member

potiuk commented Dec 21, 2020

Also @HaloKo4 -> those are not 'requirements' - those are 'constraint' files. This means that when you are installing airflow you can use those constraint files and they will "guarantee" that the installation will produce non-conflicting set of dependencies (as of 1.10.14 and 2.0.0). We have a process in place where during our CI builds we automatically upgrade to newer constraints where possible (and verify if airflow works and tests pass). Those constraints work in the 'worst' case when you install all "ci" extras "devel_ci" - which basically means "everything".

You are still free to upgrade to newer version of any dependency for example if you only installed several extras and the constraints are only applied at airflow installation time.

@eladkal
Copy link
Contributor

eladkal commented Sep 30, 2021

@mik-laj do we have further tasks on this issue?
I'm not able to run the notebook you shared to get updated list.

@potiuk
Copy link
Member

potiuk commented Oct 3, 2021

I am not sure we should keep it as "open issue" if there are no clear tasks to complete.

I think the scope of the issue is very generic and it has no chance to ever end if it is defined the way it is defined. I think we should not have (or have very little) of generic issues such us "continuous update of something" without having a list of tasks this leads to.

The current approach for dependency management is well estabilished when it comes to automated upgrades::

  • we try put no upper boundaries of dependencies
  • unless we have to for some reason and we document what the reason is in setup.py
  • the "constraint upgrade" mechanism is specifically designed to update the constrainst as the new versions are released

The limitations we have are in:

  • setup.cfg when they are imposed by airlfow
  • setup.py when they are imposed by one of the providers
  • Dockerfile.ci/Dockerfile if they are imposed by imperfect pip eager-upgrade mechanism that will prevent the combination of airflow + providers from working correctly when the eager-update mechanism is invoked.

However we have no process currently (and this issue is not helping with it) on how to get rid of the upper-bound limitations over time - this is done on an ad-hoc base.

I think most of those limitations are somewhaat documented (with URLs and explanation where they came from in the setup.py/.cfg/Dockerfiles (but possibly not all of them are). I believe we should have an explanation for every single upper-bound limit that we have. Also some of the reasons for those might be alreaady gone, but we do not know it, some of them might need some work to remove (for example recent celery 5 upgrade).

I think a good idea how to treat that one would be that someone ( might be a even a good first issue) reviews all the setup.py/setup.cfg/Dockerfile upper-bound limitations and verifies if the reasons are:

a) documented
b) still valid
c) maybe even we should create a separate issue type for such limitations

Then what we possibly need is some kind of periodic review of those "upper bound" - if we have list of issues for those, that might be actually possible to manage and contro and even periodically review them (I think we could even automate big parts of this when the new GitHub Issues are enabled for public projects).

So I think we should close that issue and create a bunch of issues (grouped together) for every single limitation we have now and figure out what we do with those - but it requires a non-trivial amount of checking/verifying the current limits (maybe some investigations as well on where they came from), documentting and possibly organising a bit the process of regular review.

Anyone :) ?

@eladkal
Copy link
Contributor

eladkal commented Feb 19, 2022

I agree with you Jarek and since the months passed without further comments I'm closing the issue.
If needed lets open new issues with defined scope

@eladkal eladkal closed this as completed Feb 19, 2022
@potiuk
Copy link
Member

potiuk commented Feb 19, 2022

Actually - we already agreed, merged and achieved lazy consensus to try the approach that addresses that issue:

#21356

We removed upper limits from most of the libraries and only really left those that we know why we are limiting them and what is the condition to remove those, which allows us to periodicallly (at minor relase time) to review whether the reasons for limiting are gone .

@potiuk
Copy link
Member

potiuk commented Feb 19, 2022

So it's more than 'month passed' :) . We actually implemented actions to address it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dependencies Issues related to dependencies problems kind:bug This is a clearly a bug security Security issues that must be fixed
Projects
None yet
Development

No branches or pull requests

7 participants