Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Resolver: Avoiding network access if the installed candidate is good enough #8023

Closed
uranusjr opened this issue Apr 11, 2020 · 9 comments · Fixed by #8932
Closed

New Resolver: Avoiding network access if the installed candidate is good enough #8023

uranusjr opened this issue Apr 11, 2020 · 9 comments · Fixed by #8932

Comments

@uranusjr
Copy link
Member

uranusjr commented Apr 11, 2020

Context: https:/pypa/pip/pull/8014/files#r407079100

I think it would be best to implement something like:

@dataclasses.dataclass()
class CandidateSequence(collections.abc.Sequence):
    before: Sequence[Candidate]
    found_iter: Iterator[Candidate]
    after: Sequence[Candidate]

    # Implement the sequence protocol and __reverse__()
    # https://docs.python.org/3.8/reference/datamodel.html#object.__reversed__


class Factory:
    def find_matches(self):
        def found_iter():
            found = self._factory.finder.find_best_candidate(...)
            for ican in found.iter_applicable():
                yield self._make_candidate_from_link(...)

    dist = self._installed_dists.get(name)
    if dist is not None:
        after = [dist]
    else:
        after = []
    return CandidateSequence(before=[], found_iter=found_iter(), after=after)

We would be able to skip network I/O if the installed version is good enough.

@pradyunsg
Copy link
Member

Hmm... Wouldn't the main change in resolvelib be dropping a reversed() call here? I understand that flipping the order would require changing tests in resolvelib, as well as changes in the implementation on pip's side, but it would result in less code to maintain overall IIUC. I'm not sure if I'm missing context or something else here.

What's the reasoning behind introducing this complexity into pip?

@uranusjr
Copy link
Member Author

uranusjr commented Apr 11, 2020

Maybe I was misunderstanding want you meant; I thought you were looking to eliminate the network call to fetch the candidate list from the index?

Flipping the list order wouldn’t work, since the provider does not know whether it needs to build that list before returning it to the resolver. The provider still needs to append those remote candidates after the installed one, even if they never got accessed by the resolver, because otherwise the resolver would straight up fail when it needs to backtrack but doesn’t have those candidates to work with.

From what I can see, the only way to avoid that network call, is to delay the decision whether to network until the resolver actually tries to access the list. (And that alone might not be enough in at lot of the situations either.)

@pradyunsg
Copy link
Member

pradyunsg commented Apr 11, 2020

Hmm... I hear you. I was thinking the provider could return a generator that resolvelib uses (w/ highest preference first; i.e. change the find_matches API).

I think this needs a higher bandwidth discussion, since we'd also need to look into changing our implementation of get_preference if we're going to do this.

@pradyunsg pradyunsg changed the title New Resolver: Implement a “Lazy” list to avoid network access if the installed candidate is good enough New Resolver: Avoiding network access if the installed candidate is good enough Apr 11, 2020
@uranusjr
Copy link
Member Author

Yeah, changing it to return an iterator makes sense to me. This can likely be applied to other parts of the ResolveLib to minimise prepare calls (e.g. in criteria it can use generators instead of eagerly filtering on the candidate list).

@pfmoore
Copy link
Member

pfmoore commented Apr 11, 2020

Agreed, a highest-preference-first iterator would be an easier thing for providers to implement. If that's a possible API change for resolvelib, I'd be in favour of doing it there.

@uranusjr
Copy link
Member Author

I’m anticipating the most difficulty would be occur when a parent package backtracks. Say both A 1.0 and 2.0 depend on B. If the resolver visited 2.0 but backtracked, it needs to re-visit the available B candidates when trying A 1.0. Either the resolver needs to call find_matches() for the same package again, or it needs to implement some sort of “reusable iterator” structure like the code snippet in the issue description.

@pradyunsg
Copy link
Member

@uranusjr Would the following API change be possible for resolvelib -- if find_matches returns a generator, use it as-is, otherwise reverse the order?

@uranusjr
Copy link
Member Author

No, the resolver much take something that can be iterated through multiple times, for backtracking.

@uranusjr
Copy link
Member Author

Also, I checked earlier, the current provider implementation requires significant changes to be able to return a lazily evaluated iterator.

bors bot referenced this issue in duckinator/emanate Oct 18, 2020
192: Update pip to 20.2.4 r=duckinator a=pyup-bot


This PR updates [pip](https://pypi.org/project/pip) from **20.2.3** to **20.2.4**.



<details>
  <summary>Changelog</summary>
  
  
   ### 20.2.4
   ```
   ===================

Deprecations and Removals
-------------------------

- Document that certain removals can be fast tracked. (`8417 &lt;https:/pypa/pip/issues/8417&gt;`_)
- Document that Python versions are generally supported until PyPI usage falls below 5%. (`8927 &lt;https:/pypa/pip/issues/8927&gt;`_)

Features
--------

- New resolver: Avoid accessing indexes when the installed candidate is preferred
  and considered good enough. (`8023 &lt;https:/pypa/pip/issues/8023&gt;`_)
- Improve error message friendliness when an environment has packages with
  corrupted metadata. (`8676 &lt;https:/pypa/pip/issues/8676&gt;`_)
- Cache package listings on index packages so they are guarenteed to stay stable
  during a pip command session. This also improves performance when a index page
  is accessed multiple times during the command session. (`8905 &lt;https:/pypa/pip/issues/8905&gt;`_)
- New resolver: Tweak resolution logic to improve user experience when
  user-supplied requirements conflict. (`8924 &lt;https:/pypa/pip/issues/8924&gt;`_)

Bug Fixes
---------

- New resolver: Correctly respect ``Requires-Python`` metadata to reject
  incompatible packages in ``--no-deps`` mode. (`8758 &lt;https:/pypa/pip/issues/8758&gt;`_)
- New resolver: Pick up hash declarations in constraints files and use them to
  filter available distributions. (`8792 &lt;https:/pypa/pip/issues/8792&gt;`_)
- New resolver: If a package appears multiple times in user specification with
  different ``--hash`` options, only hashes that present in all specifications
  should be allowed. (`8839 &lt;https:/pypa/pip/issues/8839&gt;`_)

Improved Documentation
----------------------

- Add ux documentation (`8807 &lt;https:/pypa/pip/issues/8807&gt;`_)
   ```
   
  
</details>


 

<details>
  <summary>Links</summary>
  
  - PyPI: https://pypi.org/project/pip
  - Changelog: https://pyup.io/changelogs/pip/
  - Homepage: https://pip.pypa.io/
</details>



Co-authored-by: pyup-bot <[email protected]>
@pradyunsg pradyunsg removed the S: needs triage Issues/PRs that need to be triaged label Feb 12, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants