Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate, reuse PackageFinder best candidate logic #5971

Merged
merged 5 commits into from
Apr 16, 2019

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Oct 30, 2018

My take on #5175. This turns out to be more complicated than I imagined. I decided to make the outdated check follow how pip finds versions for requirements passed by the user:

  • If allow_all_prereleases is True, consider prereleases. (not applicable for self version check)
  • Try to find the latest stable version.
  • If no stable versions are available, try to offer prereleases. (PEP 440)

I also tweaked the code in find_requirement a little to refactor out some duplicate formatting.

Fix #5175, close #5928.

@uranusjr uranusjr force-pushed the pip-latest-version-prerelease branch 2 times, most recently from e5bfbdf to df68f3b Compare October 30, 2018 23:59
@cjerdonek
Copy link
Member

Question / observation: to make this much more easily testable (and easier to understand, etc), it looks like you might be able to convert both of find_requirement() and find_best_match() into pure functions that accept a return value of find_all_candidates() (and some other simpler arguments). Then find_requirement() could be implemented as calling find_all_candidates(), followed by the one-line call to the pure function.

Does that look right to you?

@cjerdonek cjerdonek added T: bugfix C: finder PackageFinder and index related code labels Oct 31, 2018
@uranusjr
Copy link
Member Author

uranusjr commented Nov 1, 2018

I considered that approach as well, but opted for the least-effort one in the end. I’ll try it instead. The call in pip_version_check would look a little awkward, but that is probably because of the convoluted state of PackageFinder. Hopefully I can untangle things more with future refactoring…

@uranusjr uranusjr force-pushed the pip-latest-version-prerelease branch from df68f3b to 9b20265 Compare November 1, 2018 07:06
@cjerdonek
Copy link
Member

Coming back to this (I was busy with other things lately), I think your previous version might have been better. I don't know for sure because I can't see it any longer, but it might be because of the awkwardness you mentioned -- without a similar gain to offset it. My suggestion of passing the return value of find_all_candidates() was only if it would result in being able to factor out a pure function (which would be the gain). That's what I was asking about the possibility of. Looking at the code more closely myself after seeing your latest version, it looks like PackageFinder._candidate_sort_key() is preventing this from being done easily. Maybe you encountered that yourself when you tried.

@uranusjr
Copy link
Member Author

uranusjr commented Nov 9, 2018

Yes, indeed _candidate_sort_key is the most problematic part. I have the previous version locally, but looking at it now, I feel both versions have some better parts. I’ll need to spend sometime (maybe this weekend) merging the good parts together.

@uranusjr uranusjr force-pushed the pip-latest-version-prerelease branch 2 times, most recently from 1b7026f to 0f9f23a Compare November 9, 2018 07:10
@uranusjr
Copy link
Member Author

uranusjr commented Nov 9, 2018

I’ve worked on this a bit, and have a sort of design problem in refactoring. An easy way to implement this would be to expose three methods:

  1. find_all_candidates(name) finds all candidates
  2. filter_applicable_candidates(candidates, specifier) takes the complete list of candidates, and return only ones matching the specifier.
  3. select_best_candidate(candidates) choose the best candidate (with _candidate_sort_key) from the applicable candidates.

Now both find_requirement() and pip_version_check() can call these three methods to do what they want. But this is a very bad interface, and invites users to implement things incorrectly. What if someone forgets to call filter_applicable_candidates()? It would then select from the wrong candidate list, but it is extremely unlikely to be caught, and ever more difficult to debug. The better interface here is to expose only two methods:

  1. find_all_candidates(name) (same)
  2. find_best_candidate(name, specifier) is the combination of the three previous methods

If you want all candidates, use the first one. If you only want to know what to install, use the second one. No way to go wrong. The problem is, however, find_requirement() relies on intermediate values (all_candidates and applicable_candidates) to do logging, and the combined interface does not allow that. (In the first version, I returned a three-tuple to contain all information, but I think that is terrible.)

The current version is what I consider a middle ground. The three methods are kept, but marked as internal (except find_all_candidates; it is already public and has users). The call sequence is implemented in find_best_candidate as public interface, but find_requirement does not use that, instead call the individual functions to get the intermediate values it wants.

I don’t think this is a good implementation, but is the correct design. If this can be merged, I will submit another purely refactoring PR to deal with the duplicate code, probably by introducing an internal structure to hold the three internal values.

@pradyunsg
Copy link
Member

Looking at the current version, I think it does what's needed well enough; and @uranusjr's suggested plan moving forward sounds fair to me.

I'll defer to @cjerdonek since he's been more involved here than I have. :)

@uranusjr
Copy link
Member Author

Soft ping :)

Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pinging / reviving this. I appreciate your work and thoughtful approaches.

Some comments.

news/5175.bugfix Outdated Show resolved Hide resolved
if applicable_candidates:
best_candidate = max(applicable_candidates,
key=self._candidate_sort_key)
best_candidate = self._select_best_candidate(applicable_candidates)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read and appreciate your comments describing your thinking on this. Thanks. While I think I understand why you did it, on balance it seems to me like it would be better to avoid copying the exact three lines right next to each other, even if this is something you plan to fix later. It seems like it would be simpler and clearer and less error-prone to have them both go through the same function.

Two easily implementable alternatives, both of which I'm sure you've already thought of, and either of which I'd be okay with--

  1. You could add a return_intermediate_values=False argument that find_requirement() could pass True for. Passing True would cause the return value to be a tuple rather than a single value. It's ugly, but would get the job done (and at least it wouldn't affect pip_version_check()'s invocation).

  2. Kind of like subprocess.run()'s API that returns a CompletedProcess object, you could return an object instead of a single value. You could even have two variants: analyze / plan / calculate_best_candidate() that returns an object containing auxiliary info, and a simpler wrapper function find_best_candidate() that pip_version_check() could use that returns just the final value. As for (1), the latter would make it so pip_version_check() would be unaffected by the extra info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CompletedProcess model sounds like a good idea :D In retrospect it is not unlike the original tuple solution, but with a better interface.

all_candidates = finder.find_all_candidates("pip")
if not all_candidates:
try:
candidate = finder.find_best_candidate("pip", SpecifierSet())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the caller shouldn't have to know about SpecifierSet objects to call find_best_candidate(). What about making that argument optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more fond of being explicit :) Convenience is not as important here since the API is internal (anyone wanting to understand the this line need to know about specifiers anyway), and always requiring a SpecifierSet reduces the chance of forgetting to pass one when you should.

Would finder.find_best_candidate("pip", specifier=None) be an acceptable middle ground?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur omitting the specifier should be okay, since finder.find_best_candidate("pip") is clear enough in terms of communicating what's happening.

I'm okay with all three options on the table, only slightly preferring the form I've stated here.

src/pip/_internal/utils/outdated.py Outdated Show resolved Hide resolved
@uranusjr uranusjr force-pushed the pip-latest-version-prerelease branch from 65cae6e to 1e85cf7 Compare April 2, 2019 06:56
@pypa-bot
Copy link

pypa-bot commented Apr 2, 2019

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will eligible for code review and hopefully merging!

@pypa-bot pypa-bot added the needs rebase or merge PR has conflicts with current master label Apr 2, 2019
Split out how PackageFinder finds the best candidate, and reuse it in the
self version check, to avoid the latter duplicating (and incorrectly
implementing) the same logic.
@uranusjr uranusjr force-pushed the pip-latest-version-prerelease branch from 1e85cf7 to ea1d5ac Compare April 2, 2019 07:00
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Apr 2, 2019
@uranusjr
Copy link
Member Author

uranusjr commented Apr 2, 2019

This looks even cleaner than I expected! Thanks so much @cjerdonek 😀

(Also rebased to keep the bot happy.)

@cjerdonek
Copy link
Member

I have some comments after the latest changes.

Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of changes in this re-write of the PR, so some new comments.

src/pip/_internal/utils/outdated.py Outdated Show resolved Hide resolved
src/pip/_internal/index.py Outdated Show resolved Hide resolved
src/pip/_internal/index.py Outdated Show resolved Hide resolved
if best_installed:
# We have an existing version, and its the best version
logger.debug(
'Installed version (%s) is most up-to-date (past versions: '
'%s)',
installed_version,
', '.join(sorted(compatible_versions, key=parse_version)) or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like parse_version got dropped in the new code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is merged into the above variable. I have made this into a function instead, with additional comments to make the usage clearer (hopefully).

src/pip/_internal/index.py Outdated Show resolved Hide resolved
def __init__(
self,
candidates, # type: List[InstallationCandidate]
specifier, # type: specifiers.BaseSpecifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think specifier should be optional (defaulting to the empty SpecifierSet).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the specifier argument optional in finder.find_candidates(), but kept it required in FoundCandidates constructor.

@uranusjr uranusjr force-pushed the pip-latest-version-prerelease branch from ce8a021 to 2882042 Compare April 8, 2019 08:27
@uranusjr
Copy link
Member Author

Are there things I could do to push this forward? Should I change FoundCandidates’s specifier init argument to optional? I feel it is quite unnecessary since the class is only intended to be instantiated by PackageFinder, and default values only makes things less obvious.

@cjerdonek
Copy link
Member

Hi @uranusjr, sorry! I actually just spent time going over this again an hour or so ago. I had been wanting to go over it again for a while but only just now got to it. I had some further minor thoughts, but I think it looks good enough. I was planning to merge it first thing tomorrow. Thanks for your thought and work on this and for your patience.

@cjerdonek
Copy link
Member

PS - if there were further tweaks you wanted to do, you could always do that in a later PR. (And if it's a smaller / easier PR, I promise reviews can happen way faster.) It's only when there's more going on in a PR that it slows things down.

@cjerdonek
Copy link
Member

Regarding FoundCandidates's init method, I was actually realizing it would be simpler to pass the set of versions to filter by, instead of passing in specifier and prereleases and recomputing that set each time. For example, since the set is needed anyways, it seems like it shouldn't have to be computed more than once (including even just for formatting). But it would be better to discuss further improvements after merging what you already have IMO, as opposed to postponing further.

@uranusjr
Copy link
Member Author

Ah, I see what you mean, good point. I’ll work on improving that after this gets merged, then.

@cjerdonek cjerdonek merged commit 14cb4f4 into pypa:master Apr 16, 2019
@cjerdonek
Copy link
Member

Thanks again for your work on this, @uranusjr! 👍Can you double-check that the issues this closes are closed?

def _format_versions(cand_iter):
# This repeated parse_version and str() conversion is needed to
# handle different vendoring sources from pip and pkg_resources.
# If we stop using the pkg_resources provided specifier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS - also, here's a code comment that got chopped off that can be fixed in a subsequent commit.

@uranusjr
Copy link
Member Author

I believe all related issues/PRs (at least those mentioned) are closed 🍿

@uranusjr uranusjr deleted the pip-latest-version-prerelease branch April 16, 2019 23:50
@lock
Copy link

lock bot commented May 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: finder PackageFinder and index related code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip 9 offers upgrades to prereleases
4 participants