Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] simplify subtypes from unions #13401

Merged
merged 2 commits into from
Sep 19, 2024
Merged

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Sep 18, 2024

Add Type::is_subtype_of method, and simplify subtypes out of unions.

@carljm carljm added the red-knot Multi-file analysis & type inference label Sep 18, 2024
Copy link
Contributor

github-actions bot commented Sep 18, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Base automatically changed from cjm/declared-vs-non to main September 19, 2024 04:47
@carljm carljm merged commit cf1e91b into main Sep 19, 2024
20 checks passed
@carljm carljm deleted the cjm/simplify-union-subtype branch September 19, 2024 05:06
@MichaReiser
Copy link
Member

This change makes UnionBuilder::map O(N^2) but I don't think there's a way to avoid that

@carljm
Copy link
Contributor Author

carljm commented Sep 19, 2024

This change makes UnionBuilder::map O(N^2) but I don't think there's a way to avoid that

I assume you mean UnionBuilder::add? There is no UnionBuilder::map.

But yes, I agree; it's now O(n^2), and I don't think that's avoidable.

EDIT: oh, I'm guessing you actually meant UnionType::map. UnionBuilder::add is not really O(n^2); it is O(n*m) if you add a union to the builder.

@AlexWaygood
Copy link
Member

If we switch to using a Vec for the elements of a union, that will make tests such as union.contains(Type::Any) O(n), right? Not sure how important that is to consider

@carljm
Copy link
Contributor Author

carljm commented Sep 19, 2024

If we switch to using a Vec for the elements of a union, that will make tests such as union.contains(Type::Any) O(n), right? Not sure how important that is to consider

I don't think in general this is an operation we will need often, for the same reason -- the question will usually be about subtyping or assignability (or equivalence), not simple type equality.

It's possible we will have the specific case of needing to know if Any/Unknown is in the union, but I think if that's an issue we could store an extra boolean flag on every union (and potentially even leave the actual Any/Unknown entry out of it) for less cost than the cost of the FxOrderSet.

@MichaReiser
Copy link
Member

EDIT: oh, I'm guessing you actually meant UnionType::map. UnionBuilder::add is not really O(n^2); it is O(n*m) if you add a union to the builder.

Yes sorry. I still think it is because we loop over n elements and loops over all elements that have been added to this point. So it's probably n(n-1)/2

@carljm
Copy link
Contributor Author

carljm commented Sep 19, 2024

I still think it is because we loop over n elements and loops over all elements that have been added to this point. So it's probably n(n-1)/2

Yes, I agree that this is accurate for UnionType::map.

I was correcting myself about UnionBuilder::add, which is itself only O(n), or O(n*m) if we are adding another union (which won't ever be the case from UnionType::map.)

carljm added a commit that referenced this pull request Sep 20, 2024
Avoid quadratic time in subsumed elements when adding a super-type of
existing union elements.

Reserve space in advance when adding multiple elements (from another
union) to a union.

Make union elements a `Box<[Type]>` instead of an `FxOrderSet`; the set
doesn't buy much since the rules of union uniqueness are defined in
terms of supertype/subtype, not in terms of simple type identity.

Move sealed-boolean handling out of a separate `UnionBuilder::simplify`
method and into `UnionBuilder::add`; now that `add` is iterating
existing elements anyway, this is more efficient.

Remove `UnionType::contains`, since it's now `O(n)` and we shouldn't
really need it, generally we care about subtype/supertype, not type
identity. (Right now it's used for `Type::Unbound`, which shouldn't even
be a type.)

Add support for `is_subtype_of` for the `object` type.

Addresses comments on #13401
}
Type::Never => {}
_ => {
let mut remove = vec![];
Copy link
Contributor

@hauntsaninja hauntsaninja Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In equivalent mypy code, I had to add a special fast path for literals. You can do better than quadratic for unions with lots of literals of the same type, which turns out to be a thing in the wild

Copy link
Contributor Author

@carljm carljm Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for pointing this out! I took a look at that optimization in mypy.

I think the case this would optimize is where a union already contains e.g. str, and then we try to add lots of string literal types to it, every one of which is redundant because its a subtype of str. Rather than going through all existing union members to check if each literal is a subtype of any of them, we can keep a hash-set of "types present in this union which have literal forms" and do an O(1) contains check against that set as the first step when adding a literal type to the union. Framed in more general terms, it's identifying that a certain set of common types have a single super-type that is most likely to rule them out of the union, and so we optimize checking for that most likely super-type by identity.

This makes sense; I'd prefer to wait to add this kind of optimization until we see it crop up in a real-world codebase and can evaluate the actual impact of the optimization in our case, but it's definitely a useful idea to keep in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what would be useful is if we added one (or more) benchmarks based on a real-world codebase that makes heavy use of large literals. (I.e., pydantic.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not quite the right description, it's also useful when the union doesn't contain the supertype (i.e. str). For instance, say if you were combining two unions that you knew consisted only of literal types, you could use a set union, which is linear. The mypy optimisation I added is basically that, but also works when there are non-literal types thrown in as well. Fair enough on waiting though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks, yeah, I misread the code. The set is unduplicated_literal_fallbacks, not duplicated_literal_fallbacks. So it looks like it's optimizing only the case you described; the mirror image of the case I described.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a new issue collating some of the perf issues mypy and pyright have encountered relating to unions: #13549

Copy link
Contributor

@hauntsaninja hauntsaninja Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
I think there are some mypy PRs missing from the list, so if you're interested in code I'd make sure to look at main.
I'll also make it such that if you're interested in real world use cases you should only have to look at primer, looks like there are 1-2 things I never actually added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants