Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] GraalPython is slow when running Cython #411

Open
da-woods opened this issue Jul 27, 2024 · 7 comments
Open

[performance] GraalPython is slow when running Cython #411

da-woods opened this issue Jul 27, 2024 · 7 comments
Assignees

Comments

@da-woods
Copy link

da-woods commented Jul 27, 2024

I've been working on getting GraalPython tested on the Cython CI. It mostly works but it's really slow.

One aspect of this is the time spent running Cython itself. Note that this is pure Python code (so it doesn't involve any interaction with your C API emulation, which I know isn't considered a fast path) - while Cython has the option of compiling itself for speed I haven't done so here for the sake of the report.

For the sake of a demo I've just done checked out the cython repository from github and done

time python cython.py Cython/Compiler/*.py

that just runs cython on a bunch of its own files (but only to the c code generation stage, it doesn't invoke any C compilers).

Some results:

Python 3.11.9
-----------
real    1m3.896s
user    0m55.934s
sys     0m4.580s

GraalPython (from the file "graalpy-24.0.2-linux-amd64.tar.gz" from your releases page)
Python 3.10.13 (Thu Jul 04 12:42:45 UTC 2024)
[Graal, Oracle GraalVM, Java 22.0.2] on linux
--------------
real    8m2.008s
user    21m20.609s
sys     0m19.100s

PyPy (pypy3.10-v7.3.12-linux64)
---------------------------------------------
real    4m18.502s
user    4m10.389s
sys     0m0.938s

The upshot is that GraalPython is about 8 times slower than CPython, (and also uses 3 cores of my CPU most of that time while CPython is largely single-threaded).

I've included PyPy just as another data-point. It's also slower for this case (although not quite as slow as GraalPython) so we're clearly doing something that isn't JIT friendly....

I haven't done any profiling beyond this basic measurement (yet).


I do realise this is essentially an enormous code-dump with the complaint "it's slow", which is never a style of bug report that I'm very impressed with when I'm on the receiving end.

@da-woods
Copy link
Author

Profiling didn't reveal too much. It's spending a large chunk of time in _visitchildren in TreeVisitor in Visitor.py, but that's not unexpected.

There's somewhere where we use

child_attrs = property(fget=operator.attrgetter('subexprs'))
#instead of 
# @property
# def child_attrs(self):
#    return self.subexprs

changing that made things a bit faster, but not dramatically so. And that's as far as I got

@scoder
Copy link

scoder commented Jul 28, 2024

GraalVM seems to have an option --cpusampler to produce profiles, including flame graphs. Maybe that can bring up some hints?
https://www.graalvm.org/latest/tools/profiling/

@da-woods
Copy link
Author

GraalVM seems to have an option --cpusampler to produce profiles, including flame graphs. Maybe that can bring up some hints?

Yes I gave those a quick go - they were what pointed out operator.attrgetter. That was the only thing that really stood out as unexpected.

I've attached some example output though

graalcpusample.txt
flamegraph.svg

@da-woods
Copy link
Author

I've improved things on our CI by turning off the JIT with the options --experimental-options --engine.Compilation=false, which seems to make things both faster, and single-core.

But we're clearly doing something what doesn't agree with how GraalPython optimizes things.

@msimacek
Copy link
Contributor

If turning off the JIT helps, then it sounds like a deoptimization loop bug (in graalpy). You're most likely doing nothing wrong (unless you're constantly generating new code and evaling it). I'll try to investigate.

@msimacek msimacek self-assigned this Jul 29, 2024
@da-woods
Copy link
Author

Thanks. I don't think it's eval/exec - we use them but very infrequently and the parts they're in don't show up on the profile.

Quick warning - if you do pip install cython I think it will compile itself. This report is just about running it without compiling it. That's easiest to get just by cloning the git repo but NO_CYTHON_COMPILE=true pip install cython also works.

@scoder
Copy link

scoder commented Jul 29, 2024

if you do pip install cython I think it will compile itself

It should actually use the Python-any wheel that we distribute on PyPI, i.e. not try to build anything locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants