Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up union simplification #12526

Open
JukkaL opened this issue Apr 5, 2022 · 6 comments
Open

Speed up union simplification #12526

JukkaL opened this issue Apr 5, 2022 · 6 comments
Labels
performance refactoring Changing mypy's internals

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 5, 2022

Union simplification (make_simplified_union) has been causing multiple performance issues (at least #9169, #12408, #12225). It can make proper subtype checks of all union items against all other items, which is O(n**2) -- with certain O(n) fast paths that cover some (but not all) problematic scenarios. Union simplification is fairly performance-critical even when we don't hit worst-case scenarios.

Here are some ideas about what we might do to improve the situation:

  1. Somehow implement union simplification of multiple Instance types (at least simple ones) in close to linear time. I suspect that this is possible under some reasonable assumptions.
  2. Cache negative results of proper subtype checks. I think that currently we only cache positive results (in mypy.typestate). This might have some drawbacks, such as a possible explosion of cache sizes. I assume there's a reason why we aren't currently doing this. Union simplification tends to perform many proper subtype checks with negative results.
  3. Avoid doing full union simplification in some cases, perhaps based on some heuristics. Union simplification should never be semantically necessary.
  4. Add fast paths for the most common union simplification operations (e.g. single item, X | None).
@erictraut
Copy link

erictraut commented Apr 5, 2022

Every large union case that I've observed in real code involves large numbers of literals — either str or int literals. All of the problems cited above appear to follow this pattern.

In pyright, I mitigate this problem by maintaining maps of str and int literal values within a union. I use these two maps to avoid the O(n**2) behavior for union operations that involve literal values. So this combines aspects of your proposal 1 and 4.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Apr 6, 2022

Every large union case that I've observed in real code involves large numbers of literals — either str or int literals. All of the problems cited above appear to follow this pattern.

The colour-science package also uses medium-sized unions (e.g. ~10 items) without any literals, which are slow enough to cause trouble (#12225). The unions contain protocol types that are particularly slow to process by mypy. Caching negative proper subtype checks involving protocol types could already help quite a bit with colour-science.

In pyright, I mitigate this problem by maintaining maps of str and int literal values within a union.

Mypy does something a bit similar in a few places. However, we decompose the union types into literals vs non-literals separately on every operation. This sounds a bit different from what pyright does, and the pyright approach sounds more efficient. We could consider adopting this approach if smaller optimizations don't seem enough.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Apr 6, 2022

Here's one more idea:

  1. We could cache the results of make_simplified_union operations, since sometimes we repeatedly simplify the exact same types over and over again. We may need to flush items from the cache (e.g. using LRU caching) to avoid using lots of memory in worst case scenarios.

JukkaL added a commit that referenced this issue Apr 7, 2022
Mypyc is bad at compiling nested functions. In one use case
we were spending 7% of the mypy runtime just creating closure
objects for `check_argument`. Here I manually inline the nested
function to avoid this overhead.

Work on #12526.
JukkaL added a commit that referenced this issue Apr 7, 2022
Mypyc is bad at compiling nested functions. In one use case
we were spending 7% of the mypy runtime just creating closure
objects for `check_argument`. Here I manually inline the nested
function to avoid this overhead.

Work on #12526.
JukkaL added a commit that referenced this issue Apr 7, 2022
This shows up as a bottleneck in some use cases, based on
profiling.

This should help with union simplification (#12526).
97littleleaf11 pushed a commit that referenced this issue Apr 7, 2022
This shows up as a bottleneck in some use cases, based on
profiling.

This should help with union simplification (#12526).
JukkaL added a commit that referenced this issue Apr 7, 2022
It can be a bottleneck in some use cases.

Work on #12526.
JukkaL added a commit that referenced this issue Apr 7, 2022
This tweaks a change made in #12539 that may have slowed things
down. The behavior introduced in the PR was more correct, but it's not
worth a potential major performance regression, since union
simplification is not something we have to get always right for
correctness.

Work on #12526.
JukkaL added a commit that referenced this issue Apr 7, 2022
This is very performance critical. Implement a few micro-optimizations
to speed caching a bit. In particular, we use dict.get to reduce the
number of dict lookups required, and avoid tuple concatenation which
tends to be a bit slow, as it has to construct temporary objects.

It would probably be even better to avoid using tuples as keys
altogether. This could be a reasonable follow-up improvement.

Avoid caching if last known value is set, since it reduces the
likelihood of cache hits a lot, because the space of literal values
is big (essentially infinite).

Also make the global strict_optional attribute an instance-level
attribute for faster access, as we might now use it more frequently.

I extracted the cached subtype check code into a microbenchmark
and the new implementation seems about twice as fast (in an
artificial setting, though).

Work on #12526 (but should generally make things a little better).
JukkaL added a commit that referenced this issue Apr 7, 2022
This tweaks a change made in #12539 that may have slowed things
down. The behavior introduced in the PR was more correct, but it's not
worth a potential major performance regression, since union
simplification is not something we have to get always right for
correctness.

Work on #12526.
JukkaL added a commit that referenced this issue Apr 7, 2022
Mypyc is bad at compiling nested functions. In one use case
we were spending 7% of the mypy runtime just creating closure
objects for `check_argument`. Here I manually inline the nested
function to avoid this overhead.

Work on #12526.

Co-authored-by: Jelle Zijlstra <[email protected]>
huguesb pushed a commit to huguesb/mypy that referenced this issue Apr 23, 2022
make_simplified_union is used in a lot of places and therefore
accounts for a significant share to typechecking time. Based
on sample metrics gathered from a large real-world codebase
we can see that:
 1. the majority of inputs are already as simple as they're
    going to get, which means we can avoid allocation extra
    lists and return the input unchanged
 2. most of the cost of `make_simplified_union` comes from
    `is_proper_subtype`
 3. `is_proper_subtype` has some caching going on under the hood
    but it only applies to `Instance`, and cache hit rate is low
    in this particular case because, as per 1) above, items are
    in fact rarely subtypes of each other

To address 1, refactor `make_simplified_union` with an optimistic
fast path that avoid unnecessary allocations.

To address 2 & 3, introduce a cache to record the result of union
simplification.

These changes are observed to yield significant improvements in
a real-world codebase: a roughly 10-20% overall speedup, with
make_simplified_union/is_proper_subtype no longer showing up as
hotspots in the py-spy profile.

For python#12526
@huguesb
Copy link
Contributor

huguesb commented Apr 23, 2022

Here's one more idea:

5. We could cache the results of `make_simplified_union` operations, since sometimes we repeatedly simplify the exact same types over and over again. We may need to flush items from the cache (e.g. using LRU caching) to avoid using lots of memory in worst case scenarios.

I have a PR doing just that. I'm not too worried about the cache growing too large. I added some logging of calls to make_simplified_union(and their inputs) while running over a large codebase (>20k files) and used sort/uniq -c to get a sense of the potential caching benefits. I saw ~70k calls to make_simplified_union but only ~7k distinct inputs/outputs, which seems like a pretty reasonable size for a cache. We could keep that cache even smaller by special-casing inputs with only one or two items.

Overall performance improvement seems to be around 10-20% for this particular codebase. Fwiw, the bulk of the slowness comes from inputs with ~10 Type[]

huguesb pushed a commit to huguesb/mypy that referenced this issue Apr 23, 2022
make_simplified_union is used in a lot of places and therefore
accounts for a significant share to typechecking time. Based
on sample metrics gathered from a large real-world codebase
we can see that:
 1. the majority of inputs are already as simple as they're
    going to get, which means we can avoid allocation extra
    lists and return the input unchanged
 2. most of the cost of `make_simplified_union` comes from
    `is_proper_subtype`
 3. `is_proper_subtype` has some caching going on under the hood
    but it only applies to `Instance`, and cache hit rate is low
    in this particular case because, as per 1) above, items are
    in fact rarely subtypes of each other

To address 1, refactor `make_simplified_union` with an optimistic
fast path that avoid unnecessary allocations.

To address 2 & 3, introduce a cache to record the result of union
simplification.

These changes are observed to yield significant improvements in
a real-world codebase: a roughly 10-20% overall speedup, with
make_simplified_union/is_proper_subtype no longer showing up as
hotspots in the py-spy profile.

For python#12526
@JukkaL
Copy link
Collaborator Author

JukkaL commented Apr 25, 2022

I have a PR doing just that. I'm not too worried about the cache growing too large.

It seems like it would be trivial to just clear the cache if we happen to hit an unexpected scenario where the cache grows too large. Normally we wouldn't need this, but if we do, the cost of occasionally clearing the cache is probably still fairly minor.

Overall performance improvement seems to be around 10-20% for this particular codebase.

This is a great result! I did some profiling and I believe that non-trivial performance improvements should be fairly common (at least in the 1-10% range), and in some cases the impact could be even higher than 20%.

However, I also noticed that our current Type hashing implementation doesn't seem to work quite right for the caching (see #12659 (comment)). The deviations that might happen will probably be very rare, but if they happen, they could be extremely confusing, so I think it's important to address the correctness of hashing first.

From the above comment, these are main issues I've found:

  1. Some Type subclasses seem to have incorrect/missing/incomplete __eq__ and/or __hash__ methods. This should be pretty easy to fix and are just bugs. It would be good to also have some unit tests for these.
  2. The line and column attributes of Type are not using in type equality/hashing, but they might plausibly affect type checking results in edge cases (e.g. locations of errors).
  3. The can_be_true and can_be_false attributes are ignored in equality/hashing, but they might plausibly affect type checking.

(1) should be easy to fix. Somebody just needs to review all __eq__ and __hash__ methods and ensure that they exist for all concrete subclasses of Type and they cover all the important attributes.

(2) Is more tricky. One option might be to remove the uses of line and column attributes during type checking, i.e. they'd only be used during semantic analysis, when we shouldn't perform any union simplification. Another option would be remove line and column attributes from most Type subclasses, except the "syntactic" ones such as UnboundType where they are required for error reporting. I'm not sure how feasible this would be. We'd need to track the line numbers separately where they are needed.

(3) This could be fixed by just including can_be_true and can_be_false in all __eq__ and __hash__ methods. A better fix might to move these away from type objects altogether and store them in some other way, but this may be quite difficult to achieve, and potentially not worth the effort.

@huguesb
Copy link
Contributor

huguesb commented Apr 30, 2022

As mentioned in #12659

  1. Definitely needs fixing. Do you have more details on which classes are affected?

  2. Can easily be addressed, I think, by removing the line/column parameters of make_simplified_union. Most callers of make_simplified_union do not supply them, and updating those that do to omit those parameters results in a green CI run (including mypy_primer): see EXPERIMENT remove line/column param to make_simplified_union #12698
    If the callers of make_simplified_union do not care about line/column then neither should the cache...

  3. I'm not quite sure what to make of that right now. The semantics of those attributes are confusing me and I will need to take a closer look at them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance refactoring Changing mypy's internals
Projects
None yet
Development

No branches or pull requests

4 participants