Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected type binding when wrapping functions #12919

Closed
mailund opened this issue Jun 1, 2022 · 3 comments · Fixed by #15287
Closed

Unexpected type binding when wrapping functions #12919

mailund opened this issue Jun 1, 2022 · 3 comments · Fixed by #15287
Labels
bug mypy got something wrong

Comments

@mailund
Copy link

mailund commented Jun 1, 2022

Bug Report

I admit that I do not know if this is a bug, and not just my incomplete understanding of how the type system works, but I have run into some unexpected behaviour, and behaviour where mypy and pyre disagree, so it might be a bug.

I am writing code that lifts functions T -> R to Optional[T] -> Optional[R] in various ways, and when I write wrapper functions, type variables get bound in ways I didn't expect. It is particularly bad when I add protocols for various types on top of it, but I can create the unexpected behaviour without it. Basically, the issue is that if I write a function that changes a Callable[[T], R] into a Callable[[Optional[T]], Optional[R]], and then apply such a function to another generic function Callable[[T],R] the type variables are no longer generic, and I cannot apply the wrapped function. This happens both when I work with wrapper functions or when I implement the wrapper as a Generic class.

To Reproduce

If I write a wrapper with a function, it might look like this:

from typing import (
    TypeVar, 
    Protocol, Generic, 
    Optional as Opt
)


_T = TypeVar('_T')

class _F(Protocol[_T]):
    def __call__(self, __x: _T) -> _T:
        ...

# Lifting with a function...
def lift(f: _F[_T]) -> _F[Opt[_T]]:
    # This function would do more, but I just return f for the
    # type checking example
    return f # type: ignore

I get the desired behaviour if I wrap a function with a concrete type

def f(x: int) -> int:
    return x

reveal_type(lift(f))  # _F[Opt[int]] -- as expected

but if I use a generic type, the wrapped function is not generic:

def g(x: _T) -> _T:
    return x

reveal_type(lift(g)) # _F[Union[_T`-1,None]] -- _T is bound (incorrectly?)
                     # pyre thinks this is _F[None] which is also odd...
lift(g)(1)  # so won't accept int

Expected Behavior

I was expecting the wrapped function to have the type Callable[[Opt[_T]], Opt[_T]] and in particular I would expect to be able to call it with an int argument.

To Reproduce

If I use a Generic class for the lifted function, it can look like this:

# Moving generic type to Generic...
class lifted(Generic[_T]):
    """Lift type _T to Opt[_T]."""

    def __init__(self, f: _F[_T]) -> None:
        self._f = f

    def __call__(self, x: Opt[_T], /) -> Opt[_T]:
        """Call lifted."""
        return self._f(x) # type: ignore

reveal_type(lifted[_T])  # _F[T?] -> lifted[_T?]

and it will work for a function with a concrete type, like f: int -> int above

reveal_type(lifted(f))          # lifted[int] -- perfect
reveal_type(lifted(f).__call__) # Opt[int] -> Opt[int] -- perfect
lifted(f)(1)                    # We can call with int
lifted(f)(None)                 # We can call with None

but things get crazy when I use a generic function like g: _T -> _T:

reveal_type(lifted(g))            # lifted[_T`1] -- _T is bound!
                                  # pyre thinks (correctly I think) it is lifted[_T]
reveal_type(lifted(g).__call__)   # Opt[_T`1] -> Opt[_T`1]
                                  # pyre says Any -> Any which I suppose is fine
lifted(g)(1)                      # incompatible type, int isn't an Opt[_T]
                                  # pyre seems to accep this one
lifted(g)(None)                   # but at least None is in Opt[_T]...

reveal_type(lifted[_T](g))          # lifted[_T?] -- Ok
reveal_type(lifted[_T](g).__call__) # def (None) ???
                                    # pyre: Opt[_T] -> Opt[_T] as expected

lifted[_T](g)(1)             # incompatible type "int"; expected "None"
lifted[_T](g)(None)          # but at least None works...
# Pyre doesn't like the [_T] here but accepts the calls without it...

Expected Behavior

Again, I would expect the Generic lifted class to accept types that match the wrapped function's type. This is more important when I need to add protocols to the types, but I would also expect it to work here.

Your Environment

I tested this in mypy playground and pyre playground

@mailund mailund added the bug mypy got something wrong label Jun 1, 2022
@erictraut
Copy link

Type variables are "solved" in the context of a call expression that targets the constructor of a generic class or a generic function. Only the type variables in the scope of that class or function are solved. Once a type variable goes out of scope, it can no longer be solved.

The expression listed(g) causes mypy to solve the TypeVar _T@lifted (i.e. _T in the scope of class lifted). The solution in this case is _T@g, which means that listed(g) evaluates to type lifted[_T@g]. However, at this point, _T@g is no longer in scope. It's an unsolved TypeVar from the scope of function g. It is not associated with the lifted class or its __call__ method, which means it cannot be solved in subsequent call expressions like lifted(g)(1).

FWIW, pyright largely agrees with mypy in this case.

So I don't think this is a bug in mypy. It's expected behavior.

@mailund
Copy link
Author

mailund commented Jun 1, 2022

Ok, I think I understand what you are saying, @erictraut, but let me just check. If I use a generic function or class with a type variable, I bind it when I apply it, and it is no longer functioning as a type variable. If I want the wrapped function to work, the wrapping has to know the concrete type.

So, for example, with a curry function (as a Generic)

from typing import (
    TypeVar, ParamSpec,
    Callable as Fn,
    Optional as Opt,
    Generic,
)

_T = TypeVar('_T')

class curry(Generic[_T]):
    def __init__(self, f: Fn[[_T, _T], _T]) -> None:
        self._f = f
    def __call__(self, x: _T) -> Fn[[_T], _T]:
        return lambda y: self._f(x, y)

I can't do something like

@curry
def f(x: _T, __y: _T) -> _T:
    return x

because _T isn't free after wrapping, and I cannot do this either

@curry[_T]
def f(x: _T, __y: _T) -> _T:
    return x

either, because I can't use a type variable like this when it is unbound?

I can, however, use it with concrete types

def f(x: _T, __y: _T) -> _T:
    return x

fi = curry[int](f)
reveal_type(fi)                  # curry[int]
reveal_type(fi(1))               # def int -> int
reveal_type(fi(1)(2))            # int

fs = curry[str](f)
reveal_type(fs)                  # curry[str]
reveal_type(fs("foo"))           # str -> str
reveal_type(fs("foo")("bar"))    # str

or I can use _T inside a function that uses the type variable because there it will be bound when I evaluate the function and thus get a concrete type there.

def bind(x: _T, f: Fn[[_T,_T], _T]) -> Fn[[_T], _T]:
    # ok here since _T is bound when we evaluate
    res = curry[_T](f)
    return res(x)
    
b = bind(12, f)
reveal_type(b)  # int -> int

b2 = bind("foo", f)
reveal_type(b2)  # str -> str

As long as I don't use an unbound type variable when instantiating the Generic here, it seems to work. Is that guaranteed, and is that the way to achieve what I tried above?

@sterliakov
Copy link
Contributor

Just adding my 2 cents here, because I've faced this problem once before (and ended with very ugly pattern that violates DRY, see below).

Related: #1317

It is understandable why does mypy "solve" this variable earlier than we want. But there is no way to define the following decorator: "given function that takes fixed arguments set (say, f(x, y) that implements min signature, like f(x: _T, y: _T) -> _T), produce a function that has same signature with types Transformed[_T]", where Transformed is expressible with typing primitives and depends only on _T (propagating Optional is just one of possible use cases). It is important that _T is the same for decorated function and result, but we do not know _T value in advance.

I think that the following (very raw) approach should be better:

Given decorator:

from typing import Callable, SupportsInt, TypeVar

_T = TypeVar('_T')
_P = TypeVar('_P', bound=SupportsInt)

def allow_none(f: Callable[[_T], _T]) -> Callable[[_T | None], _T | None]:
    def inner(x: _T | None, /) -> _T | None:
        return None if x is None else f(x)
    return inner

@allow_none
def foo(x: _P) -> _P:
    if x is None:
        raise ValueError
    return x

After producing F[_P | None] type checker should recognize the fact that _P is used both on LHS and RHS of Callable definition. When it happens, _P should go back to F scope, allowing the expected substitution with respect to SupportsInt bound.

The same should happen is resulting type is Generic or Protocol: instead of producing generic class with unsolvable type variable, mypy should keep it unresolved. Now any function that performs similar operation (f(x: G[_T]) -> G[_T | None], where G is any Protocol or Generic) is malformed: when applied to generic with type variable G[_T], it produces G[_T | None] with _T out of scope, so only Any and None are compatible. It is definitely not the desired behavior.

I suppose that no existing code should be broken after such change. The resolution algorithm now produces unusable functions (they don't take expected arguments), so usage of such decorators must be very rare and complex. When decorated function is using concrete types, this approach will change nothing.

The only solution now is to explicitly re-declare type of decorated function. But why are they compatible, if foo(1) is OK while allow_none(_foo)(1) throws mypy error: E: Argument 1 has incompatible type "int"; expected "Optional[_P]"? I don't know, it seems to be inconsistent, LHS type is wider like x: int = 0.1, isn't it?

...  # See example above

def _foo(x: _P) -> _P:
    if x is None:
        raise ValueError
    return x

foo: Callable[[_P | None], _P | None] = allow_none(_foo)

reveal_type(foo)  # N: Revealed type is "def [_P <: typing.SupportsInt] (Union[_P`-1, None]) -> Union[_P`-1, None]"
reveal_type(foo(None))  # N: Revealed type is "None"
reveal_type(foo(1))  # N: Revealed type is "Union[builtins.int, None]"

ilevkivskyi added a commit that referenced this issue Jun 18, 2023
Fixes #1317
Fixes #5738
Fixes #12919 
(also fixes a `FIX` comment that is more than 10 years old according to
git blame)

Note: although this PR fixes most typical use-cases for type inference
against generic functions, it is intentionally incomplete, and it is
made in a way to limit implications to small scope.

This PR has essentially three components (better infer, better solve,
better apply - all three are needed for this MVP to work):
* A "tiny" change to `constraints.py`: if the actual function is
generic, we unify it with template before inferring constraints. This
prevents leaking generic type variables of actual in the solutions
(which makes no sense), but also introduces new kind of constraints `T
<: F[S]`, where type variables we solve for appear in target type. These
are much harder to solve, but also it is a great opportunity to play
with them to prepare for single bin inference (if we will switch to it
in some form later). Note unifying is not the best solution, but a good
first approximation (see below on what is the best solution).
* New more sophisticated constraint solver in `solve.py`. The full
algorithm is outlined in the docstring for `solve_non_linear()`. It
looks like it should be able to solve arbitrary constraints that don't
(indirectly) contain "F-bounded" things like `T <: list[T]`. Very short
the idea is to compute transitive closure, then organize constraints by
topologically sorted SCCs.
* Polymorphic type argument application in `checkexpr.py`. In cases
where solver identifies there are free variables (e.g. we have just one
constraint `S <: list[T]`, so `T` is free, and solution for `S` is
`list[T]`) it will apply the solutions while creating new generic
functions. For example, if we have a function `def [S, T] (fn:
Callable[[S], T]) -> Callable[[S], T]` applied to a function `def [U]
(x: U) -> U`, this will result in `def [T] (T) -> T` as the return.

I want to put here some thoughts on the last ingredient, since it may be
mysterious, but now it seems to me it is actually a very well defined
procedure. The key point here is thinking about generic functions as
about infinite intersections or infinite overloads. Now reducing these
infinite overloads/intersections to finite ones it is easy to understand
what is actually going on. For example, imagine we live in a world with
just two types `int` and `str`. Now we have two functions:
```python
T = TypeVar("T")
S = TypeVar("S")
U = TypeVar("U")

def dec(fn: Callable[[T], S]) -> Callable[[T], S]: ...
def id(x: U) -> U: ...
```
the first one can be seen as overload over
```
((int) -> int) -> ((int) -> int)  # 1
((int) -> str) -> ((int) -> str)  # 2
((str) -> int) -> ((str) -> int)  # 3
((str) -> str) -> ((str) -> str)  # 4
```
and second as an overload over
```
(int) -> int
(str) -> str
```
Now what happens when I apply `dec(id)`? We need to choose an overload
that matches the argument (this is what we call type inference), but
here is a trick, in this case two overloads of `dec` match the argument
type. So (and btw I think we are missing this for real overloads) we
construct a new overload that returns intersection of matching overloads
`# 1` and `# 4`. So if we generalize this intuition to the general case,
the inference is selection of an (infinite) parametrized subset among
the bigger parameterized set of intersecting types. The only question is
whether resulting infinite intersection is representable in our type
system. For example `forall T. dict[T, T]` can make sense but is not
representable, while `forall T. (T) -> T` is a well defined type. And
finally, there is a very easy way to find whether a type is
representable or not, we are already doing this during semantic
analyzis. I use the same logic (that I used to view as ad-hoc because of
lack of good syntax for callables) to bind type variables in the
inferred type.

OK, so here is the list of missing features, and some comments on them:
1. Instead of unifying the actual with template we should include
actual's variables in variable set we solve for, as explained in
#5738 (comment). Note
however, this will work only together with the next item
2. We need to (iteratively) infer secondary constraints after linear
propagation, e.g. `Sequence[T] <: S <: Sequence[U] => T <: U`
3. Support `ParamSpec` (and probably `TypeVarTuple`). Current support
for applying callables with `ParamSpec` to generics is hacky, and kind
of dead-end. Although `(Callable[P, T]) -> Callable[P, List[T]]` works
when applied to `id`, even a slight variation like `(Callable[P,
List[T]]) -> Callable[P, T]` fails. I think it needs to be re-worked in
the framework I propose (the tests I added are just to be sure I don't
break existing code)
4. Support actual types that are generic in type variables with upper
bounds or values (likely we just need to be careful when propagating
constraints and choosing free variable within an SCC).
5. Add backtracking for upper/lower bound choice. In general, in the
current "Hanoi Tower" inference scheme it is very hard to backtrack, but
in in this specific choice in the new solver, it should be totally
possible to switch from lower to upper bound on a previous step, if we
found no solution (or `<nothing>`/`object`).
6. After we polish it, we can use the new solver in more situations,
e.g. for return type context, and for unification during callable
subtyping.
7. Long term we may want to allow instances to bind type variables, at
least for things like `LRUCache[[x: T], T]`. Btw note that I apply force
expansion to type aliases and callback protocols. Since I can't
transform e.g. `A = Callable[[T], T]` into a generic callable without
getting proper type.
8. We need to figure out a solution for scenarios where non-linear
targets with free variables and constant targets mix without secondary
constraints, like `T <: List[int], T <: List[S]`.

I am planning to address at least majority of the above items, but I
think we should move slowly, since in my experience type inference is
really fragile topic with hard to predict long reaching consequences.
Please play with this PR if you want to and have time, and please
suggest tests to add.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants