Bad performance with TypedDicts and unions #7617

roganov · 2024-04-04T17:28:47Z

Describe the bug
Pyright takes 10 seconds to type-check the code below, while for Mypy it's 0.4 seconds. This issue comes from #7612.

Code or Screenshots
The repro could probably be smaller, I'm sorry for this. Note that the more

x: NodeFile
y: Node = x

I add, the slower it becomes.

import typing


class NodeFile(typing.TypedDict, total=False):
    name: typing.Literal["FILE"]


class NodeDocument(typing.TypedDict, total=False):
    name: typing.Literal["DOCUMENTDEF"]
    children: list[typing.Union['NodeField', 'NodeGroup', 'NodeValidation', 'NodeFormat', 'NodeEnv', 'NodePreValidation', 'NodeImports']]


class NodeGroup(typing.TypedDict, total=False):
    name: typing.Literal["GROUPDEF"]
    children: list[typing.Union['NodeField', 'NodeGroup', 'NodeValidation', 'NodeHelp', 'NodeEnv', 'NodePreValidation']]


class NodeField(typing.TypedDict, total=False):
    name: typing.Literal["FIELDDEF"]
    children: list[typing.Union['NodeValidation', 'NodeHelp', 'NodeFormat']]


class NodeJava(typing.TypedDict, total=False):
    name: typing.Literal["java"]
    text: str


class NodeFieldFormat(typing.TypedDict, total=False):
    name: typing.Literal["FIELDFMT"]
    text: str


class NodeValidation(typing.TypedDict, total=False):
    name: typing.Literal["VALIDATION"]
    children: list['NodeJava']


class NodeImports(typing.TypedDict, total=False):
    name: typing.Literal["IMPORTS"]
    text: str


class NodeFormat(typing.TypedDict, total=False):
    name: typing.Literal["FORMAT"]
    children: list[typing.Union['NodeDelimiter', 'NodeFieldFormat']]


class NodePreValidation(typing.TypedDict, total=False):
    name: typing.Literal["PREVALIDATION"]
    children: list['NodeJava']


class NodeHelp(typing.TypedDict, total=False):
    name: typing.Literal["HELP"]
    text: str


class NodeEnv(typing.TypedDict, total=False):
    name: typing.Literal["ENVIRONMENT"]
    children: list['NodeJava']


class NodeDelimiter(typing.TypedDict, total=False):
    name: typing.Literal["DELIMITER"]
    text: str


Node = typing.Union['NodeFile', 'NodeDocument', 'NodeGroup', 'NodeField', 'NodeJava', 'NodeFieldFormat', 'NodeValidation', 'NodeImports', 'NodeFormat', 'NodePreValidation', 'NodeHelp', 'NodeEnv', 'NodeDelimiter']

x: NodeFile
y: Node = x

x: NodeFile
y: Node = x

x: NodeFile
y: Node = x

x: NodeFile
y: Node = x

x: NodeFile
y: Node = x

VS Code extension or command-line
Command line, 1.1.357

The text was updated successfully, but these errors were encountered:

erictraut · 2024-04-04T18:24:04Z

Thanks for the bug report. I'm able to repro the issue, and I'll investigate further.

Out of curiosity, in your real code (as opposed to this sample), do you have a bunch of unbound variables like x?

If I replace the last few lines with the following (which eliminates the unbound variables), the performance issue goes away.

def func1(x: NodeFile):
    y: Node = x

def func2(x: NodeFile):
    y: Node = x

def func3(x: NodeFile):
    y: Node = x

def func4(x: NodeFile):
    y: Node = x

def func5(x: NodeFile):
    y: Node = x

Similarly:

x1: NodeFile = {}
y1: Node = x1

x2: NodeFile = {}
y2: Node = x2

x3: NodeFile = {}
y3: Node = x3

x4: NodeFile = {}
y4: Node = x4

x5: NodeFile = {}
y5: Node = x5

So, on first analysis, this seems to be related to unbound variables. Those don't typically appear in working code, which might explain why such a perf issue has gone unnoticed and unreported until now.

In any case, I'll dig into it further.

roganov · 2024-04-04T18:46:54Z

No, the real code is more like this:

class My:
    def __init__(self, doc: NodeDocument):
        self.doc = doc

    def my(self):
        x: Node = self.doc

It still seems to reproduce, but the performance is not as bad.

I have a file with a class definition that contains two lines similar to x: Node = self.doc (read definitions of nodes are more complex). When I run pyright on that file, it takes 110 seconds and reports no errors. When I comment out these two lines, pyright returns in 16 seconds (which is still arguably slow). However, I think it may be a red herring because the x variable doesn't exist anymore and maybe pyright just doesn't analyze some other code which may be the real issue.

I admit that I do not fully understand what the issue is, but I'll keep digging.

…umstances (e.g. when comparing large unions of TypedDict types), this can save significant time. This addresses #7617.

erictraut · 2024-04-04T21:54:38Z

TypedDicts are more expensive than regular classes because they're structural, so type comparisons require the type checker to examine each field with the TypedDict. In your code sample, there were nested TypedDicts (i.e. TypedDict definitions that included fields whose types included other TypedDicts), and some of these involved unions of TypedDicts, some of which were partially overlapping. All of these factors combined to result in some very expensive type comparisons, which explains the performance issue you reported.

I've added a quick optimization that shortcuts the type compatibility check as soon as the first mismatching field is detected. (This optimization works only in cases where diagnostic information isn't required. When diagnostics are required, we need to evaluate all fields in the TypedDict so we can report each mismatch to the user in the diagnostic message.)

This optimization saves significant time and avoids the deeply-nested type comparisons in your code sample. On my computer, analysis times dropped from about 10,000ms down to about 35ms. This optimization will be included in the next release. I'm hopeful that it will fully address the perf issue that you're seeing, but there's a chance it won't.

TypedDicts are similar in some ways to protocols. Both are structural types. Pyright has extensive (and relatively complex) optimizations for protocol classes, but it doesn't have analogous optimizations for TypedDicts. If my quick optimization doesn't address your performance problem, I could go further (at the expense of additional code complexity and some memory usage). Let me know how the next version of pyright performs, and I'll decide whether additional optimization is needed.

roganov · 2024-04-05T13:36:44Z

Thank you, I'll try it out with the new release.

erictraut · 2024-04-10T02:39:03Z

This is addressed in pyright 1.1.358, which I just published.

roganov · 2024-04-10T13:51:21Z

The performance is much better now -- down to 20 seconds. Thank you!

erictraut · 2024-04-10T15:11:22Z

Thanks for letting me know. Out of curiosity, how does that compare to mypy's analysis time for the same code base? For most code bases, I find that mypy takes somewhere between 1x and 3x as long as pyright. I'm always on the lookout for cases where mypy is faster.

roganov · 2024-04-11T16:39:05Z

Mypy without cache (that is, I removed the .mypy_cache folder):

$ time mypy app
mypy app  23.02s user 1.55s system 97% cpu 25.133 total

Pyright:

$ time pyright app
pyright app  17.43s user 0.70s system 242% cpu 7.465 total

That is, Pyright is 4x faster. In practice, Mypy is snappier because of the cache, but Pyright has the --watch mode.

(I incorrectly said "down to 20 seconds" in my previous message - it's 7-8 seconds).

roganov added the bug Something isn't working label Apr 4, 2024

erictraut added a commit that referenced this issue Apr 4, 2024

Added optimization for TypedDict type assignments. Under certain circ…

aff916f

…umstances (e.g. when comparing large unions of TypedDict types), this can save significant time. This addresses #7617.

erictraut added the addressed in next version Issue is fixed and will appear in next published version label Apr 4, 2024

erictraut closed this as completed Apr 10, 2024

AlexWaygood mentioned this issue Sep 29, 2024

[red-knot] Add a benchmark involving realistic code that creates large unions astral-sh/ruff#13549

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad performance with TypedDicts and unions #7617

Bad performance with TypedDicts and unions #7617

roganov commented Apr 4, 2024 •

edited

Loading

erictraut commented Apr 4, 2024

roganov commented Apr 4, 2024 •

edited

Loading

erictraut commented Apr 4, 2024

roganov commented Apr 5, 2024

erictraut commented Apr 10, 2024

roganov commented Apr 10, 2024

erictraut commented Apr 10, 2024

roganov commented Apr 11, 2024 •

edited

Loading

Bad performance with TypedDicts and unions #7617

Bad performance with TypedDicts and unions #7617

Comments

roganov commented Apr 4, 2024 • edited Loading

erictraut commented Apr 4, 2024

roganov commented Apr 4, 2024 • edited Loading

erictraut commented Apr 4, 2024

roganov commented Apr 5, 2024

erictraut commented Apr 10, 2024

roganov commented Apr 10, 2024

erictraut commented Apr 10, 2024

roganov commented Apr 11, 2024 • edited Loading

roganov commented Apr 4, 2024 •

edited

Loading

roganov commented Apr 4, 2024 •

edited

Loading

roganov commented Apr 11, 2024 •

edited

Loading