Simplify input requirement parsing #7019

chrahunt · 2019-09-14T23:04:54Z

What's the problem this feature will solve?

Currently pip accepts several types of input as "requirements":

name-based requirements (PEP 508)
direct references (PEP 440 - unsupported currently per pip fails to install remote dependency when it is a .whl and follows PEP 440 #6202, but should be accepted)
file paths (no PEP, just current behavior)
URLs (no PEP, just current behavior)

The parsing for these is ad-hoc and pretty complicated, with lots of code paths (see here). This makes it hard to understand:

the error that a user may see given some invalid input
the possible initial states of InstallRequirement given a user input

It is also impossible to re-use the current code to initialize any other kind of type than an InstallRequirement (so this is a prereq for some of the build refactoring).

Describe the solution you'd like

At a high level we need to map any arbitrary input to one of the 4 categories mentioned above. This is difficult to do unambiguously because we accept file paths, so I think we should make some assumptions and then users that want to use weird file paths can feel free to use an explicit file:// URL.

The primary standards-based constraints are:

a PEP 440 direct reference contains a @ followed by <scheme>:// followed optionally by ; and markers which can have any content
a name-based requirement will consist of non-@ characters followed optionally by extras and specifiers and then by ; and markers which can have any content

Simplifying assumptions:

A file path provided by the user must have at least one of ., /, or \ (on Windows), followed optionally by something that looks like extras and something that looks like markers
The URI_reference part of a direct reference will contain ://
URLs passed by the user will contain ://

That leads to the following rules for deciding how to process input:

if the input contains "@" followed eventually by "://" (with no preceding ";") then we treat it like a direct reference - pass it to Requirement and derive all fields of RequirementInfo from that
If the input contains "://" (with no preceding ";") then we treat it like a URL - we manually extract markers and optional package name and extras from and #egg= fragment, which are used to instantiate a Requirement if present. Any missing fields get derived from the Requirement if set.
If the input contains os.pathsep or os.altsep or starts with '.' then we treat it like a path, convert it to an absolute file URL and process the same as 2.
Otherwise, we treat it like a name-based requirement - pass it to Requirement and derive all fields of RequirementInfo from that

Other details:

The module to be added is pip._internal.req.parsing with a function parse_requirement_text that takes a string as would be input by a user or in dependency metadata and returns a RequirementInfo. RequirementInfo would contain:

markers: Set[Marker]
link: Optional[Link] - if None then it's a name-based requirement
requirement: Optional[Requirement] - if None then it's an "unnamed" requirement
extras: Set[str]

parse_requirement_text would do the steps as described above.

parse_requirement_text would not do any filesystem operations or logging and it should map any expected exceptions to RequirementParsingError with an indication of how we were trying to process the text (direct reference, url, path, or name-based).

Once implemented, we should refactor req.constructors.install_req_from_* to delegate parsing to parse_requirement_text and just do operations on the returned RequirementInfo.

Alternative Solutions

Refactor the existing code while preserving all possible existing behaviors. Having just tried that, it's a big pain and the result doesn't look very good.

Additional context

Reduce the number of paths through the code #5204

The text was updated successfully, but these errors were encountered:

cjerdonek · 2019-09-15T22:37:19Z

How does this relate to PR #6203?

cjerdonek · 2019-09-15T23:00:48Z

Also, in your write-up of the proposed rules, can you distinguish between choices that are forced by / follow logically from PEP’s, and rules that are more heuristics of your choosing? It seems like PR #6203 uses different heuristics (though I’m not certain). I think it would be helpful for people to know if / where there might be any ambiguity in interpreting and applying any of the PEP’s, and if we are making any choices here.

chrahunt · 2019-09-15T23:42:59Z

How does this relate to PR #6203?

Unrelated. The changes there are improvements on the existing code, but keep most of the logic locked up in install_req_from_line. The purpose here is to extract all of the logic for turning a "string that pip accepts" into the pieces of data needed to instantiate InstallRequirement (and different types in the future).

chrahunt · 2019-09-16T00:13:59Z

can you distinguish between choices that are forced by / follow logically from PEP’s, and rules that are more heuristics of your choosing?

I have updated the original issue to make it more explicitl.

cjerdonek · 2019-09-16T00:14:29Z

Unrelated.

Okay, I was asking because PR #7020, which is described as progressing this issue, also describes making behavior changes. So I wasn't sure if there was any overlap with the behavior changes in PR #6203.

chrahunt · 2019-09-16T00:22:31Z

I think that #7020 or a followup will likely supersede the functional change in #6203, but we would need to incorporate the tests.

cjerdonek · 2019-09-16T00:22:38Z

Also, PR #6203 introduces its own heuristics for resolving the ambiguities and deciding whether something should be interpreted as a path (something this issue uses different techniques for). It adds a _looks_like_path() function and also checks for existence on the file system for helping to make the determination. (So it feels like there is some overlap.)

cjerdonek · 2019-09-16T00:28:06Z

Okay, so it is related then (it would subsume it).. I think it would be worth comparing the heuristics used there with the approach proposed here then, as there was lots of discussion around that. (I think the reason that PR was never merged is because of the trickiness here and lack of clarity on the correct way to proceed.) This is another reason why I think it's worth separating the behavior changes from refactoring in the discussion. The behavior changes alone are subtle and a bit tricky.

chrahunt · 2019-09-16T00:33:02Z

In that PR only a single branch within install_req_from_line is being refactored, so the checks are in that context and very limited in scope (i.e. direct references are getting treated incorrectly, so what needs to be done so they are less likely to be treated incorrectly). The changes proposed here are much broader, but IMO easier to reason about since we are considering if from the perspective of what we would like to support as opposed to behavior that has accumulated over several years of changes.

cjerdonek · 2019-09-16T01:03:07Z

A couple other things that would help in the description of the proposed rules (the "leads to the following rules" part of the original issue comment) are distinguishing between the parts encoding pip's current behavior with the new logic being introduced. In other words, how much of this is new versus describing what pip already does. Something else that would help is to know if what's being proposed is backwards compatible or what, if anything, might break for people.

triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Sep 14, 2019

chrahunt added state: needs discussion This needs some more discussion type: refactor Refactoring code labels Sep 14, 2019

triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Sep 14, 2019

chrahunt mentioned this issue Sep 15, 2019

WIP: Add requirement parsing module. #7020

Closed

chrahunt mentioned this issue Jan 24, 2020

Returning a RequirementSet from Resolver.resolve #7638

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify input requirement parsing #7019

Simplify input requirement parsing #7019

chrahunt commented Sep 14, 2019 •

edited

Loading

cjerdonek commented Sep 15, 2019

cjerdonek commented Sep 15, 2019

chrahunt commented Sep 15, 2019

chrahunt commented Sep 16, 2019

cjerdonek commented Sep 16, 2019

chrahunt commented Sep 16, 2019

cjerdonek commented Sep 16, 2019 •

edited

Loading

cjerdonek commented Sep 16, 2019

chrahunt commented Sep 16, 2019 •

edited

Loading

cjerdonek commented Sep 16, 2019

Simplify input requirement parsing #7019

Simplify input requirement parsing #7019

Comments

chrahunt commented Sep 14, 2019 • edited Loading

cjerdonek commented Sep 15, 2019

cjerdonek commented Sep 15, 2019

chrahunt commented Sep 15, 2019

chrahunt commented Sep 16, 2019

cjerdonek commented Sep 16, 2019

chrahunt commented Sep 16, 2019

cjerdonek commented Sep 16, 2019 • edited Loading

cjerdonek commented Sep 16, 2019

chrahunt commented Sep 16, 2019 • edited Loading

cjerdonek commented Sep 16, 2019

chrahunt commented Sep 14, 2019 •

edited

Loading

cjerdonek commented Sep 16, 2019 •

edited

Loading

chrahunt commented Sep 16, 2019 •

edited

Loading