-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Old libraries in setup.py causing dependency resolution to pull old transitive constraints (3 years+) #12120
Comments
I will change the title of the issue, because it is not the matter of constraints, but it's the matter of limitations in setup.py and transitional dependencies that we have for all those libraries. There is nothing we can do on the constraint level. The constraints files are generated automatically by PIP dependency mechanisms - whatever PIP resolves using setup.py limits is automatically updated as new version of constraints. So if we want to do anything about it (do we?) we should remove some of the limitations there and upgrade all the different providers/core dependencies we have to use the latest version of dependent libraries - basically for each provider separately. Even that will not help in some cases because the newest version of those libraries might transitively use some older versions of dependent libraries. Does anyone have some proposal there? Should we somehow make an effort to upgrade those? Maybe there is someone who would like o lead that? Note, that in order to do it, we likely need to have some system tests in place and implemented for those providers that we decide to bump to later version of dependent libraries (https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Automation+of+System+Tests+for+external+systems) because what we basically need to do is to upgrade the libraries that we already know usually that they have some compatibility issues (for example all the google providers will have to be sooner or later migrated to >2.0.0 python libraries and automated unit testing is not enough for those kinds of changes (those libraries are not backwards compatible). WDYT others? Is it worth to make such a concerted effort? Are there some real benefits from that? Should we do it? For 2.0 or later? |
BTW. If somebody want to see where the deps come from, it's easy to use pipdeptree: |
I agree with Jarek, that some of the constraints come from dependencies of Airflow's dependencies. And since we have lots of dependencies it is not always straightforward to support all versions as these dependencies define there own requirements of versions. So while having latest and greatest would be awesome, we need to make a concise effort that we don't make Airflow incompatible with other dependencies since PIP resolver is not going to allow it any longer |
There is even a conflict in Airflow 1.10.13 with importlib-metadata, because Airflow requires |
I don't think we'd want to fix deps for 1.10.* Our focus in #12636 is to make them fixed (and non-breakable in the future) for Airflow 2.0. |
Where airflow even use dnspython ? |
Airflow doesn't use this dependency directly, but other dependencies that require Airflow do require this library. Python doesn't have dependency isolation (for example, the node has), so we need to make sure other libraries don't restrict their dependency versions too much, otherwise users won't be able to use their favorite libraries in the latest versions. Users will also not be able to get security patches if the version restrictions are too restrictive.
In this case, it looks like we need to check why eventlet requires a version older than 2.0. |
but this dependency exist only on mongo provider. what am I missing here? |
@HaloKo4 We generate one constraints file to check compatibility with all libraries and detect all conflicts, so even if mongo only requires an older version, most users will have the old version of this library. Here is the official Airflow installation guide, which installs libraries always the latest tested versions. If you try to install Airflow and dnspython according to the instructions, the latest version of the dnspython library will not be installed. AIRFLOW_VERSION=master
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install \
--constraint "${CONSTRAINT_URL}" \
".[postgres,google]" \
dnspython I found a ticket in the eventlet project and I hope that it will be solved. |
Also @HaloKo4 -> those are not 'requirements' - those are 'constraint' files. This means that when you are installing airflow you can use those constraint files and they will "guarantee" that the installation will produce non-conflicting set of dependencies (as of 1.10.14 and 2.0.0). We have a process in place where during our CI builds we automatically upgrade to newer constraints where possible (and verify if airflow works and tests pass). Those constraints work in the 'worst' case when you install all "ci" extras "devel_ci" - which basically means "everything". You are still free to upgrade to newer version of any dependency for example if you only installed several extras and the constraints are only applied at airflow installation time. |
@mik-laj do we have further tasks on this issue? |
I am not sure we should keep it as "open issue" if there are no clear tasks to complete. I think the scope of the issue is very generic and it has no chance to ever end if it is defined the way it is defined. I think we should not have (or have very little) of generic issues such us "continuous update of something" without having a list of tasks this leads to. The current approach for dependency management is well estabilished when it comes to automated upgrades::
The limitations we have are in:
However we have no process currently (and this issue is not helping with it) on how to get rid of the upper-bound limitations over time - this is done on an ad-hoc base. I think most of those limitations are somewhaat documented (with URLs and explanation where they came from in the setup.py/.cfg/Dockerfiles (but possibly not all of them are). I believe we should have an explanation for every single upper-bound limit that we have. Also some of the reasons for those might be alreaady gone, but we do not know it, some of them might need some work to remove (for example recent celery 5 upgrade). I think a good idea how to treat that one would be that someone ( might be a even a good first issue) reviews all the setup.py/setup.cfg/Dockerfile upper-bound limitations and verifies if the reasons are: a) documented Then what we possibly need is some kind of periodic review of those "upper bound" - if we have list of issues for those, that might be actually possible to manage and contro and even periodically review them (I think we could even automate big parts of this when the new GitHub Issues are enabled for public projects). So I think we should close that issue and create a bunch of issues (grouped together) for every single limitation we have now and figure out what we do with those - but it requires a non-trivial amount of checking/verifying the current limits (maybe some investigations as well on where they came from), documentting and possibly organising a bit the process of regular review. Anyone :) ? |
I agree with you Jarek and since the months passed without further comments I'm closing the issue. |
Actually - we already agreed, merged and achieved lazy consensus to try the approach that addresses that issue: We removed upper limits from most of the libraries and only really left those that we know why we are limiting them and what is the condition to remove those, which allows us to periodicallly (at minor relase time) to review whether the reasons for limiting are gone . |
So it's more than 'month passed' :) . We actually implemented actions to address it :) |
Dear and Wonderful Citizens,
I started to look at what libraries we have defined in the
constraints-*.txt
file and I am a bit surprised because we have this constraints defined on very old libraries.https:/apache/airflow/blob/053afe7/constraints-3.8.txt
Update (@potiuk): -> Just for clarity: constraints are automatically generated from setup.py so this is a matter of dependencies defined there. If we are to fix it, we will have to upgrade dependencies defined in setup.py NOT the constraints themselves.
Sometimes we have defined libraries that are over 3 years old, which can cause security problems. Old versions of the library may have vulnerabilities that have probably been fixed in newer versions.
I am most concerned about dependency conflicts. Old libraries are only compatible with old libraries, which can cause problems if the user wants to use a new version of the same library.
I think it's worth investigating where these limitations come from and why we can't use newer versions of these libraries.
You can see the list of libraries that need updating in the Jupyter interactive notebook.
https://colab.research.google.com/drive/1F5Lw8qNcxCvWaYUrGZ1x3W3v3080Dq0U#scrollTo=AfIBqzjo8UId
CC: @potiuk @ryw
The text was updated successfully, but these errors were encountered: