Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace C-code generation and compilation backend #312

Closed
brandonwillard opened this issue Mar 2, 2021 · 5 comments
Closed

Replace C-code generation and compilation backend #312

brandonwillard opened this issue Mar 2, 2021 · 5 comments
Labels
C-backend enhancement New feature or request help wanted Extra attention is needed important question Further information is requested refactor This issue involves refactoring

Comments

@brandonwillard
Copy link
Member

brandonwillard commented Mar 2, 2021

The text-based C-code generation and compilation backend in Aesara is difficult to use, debug, maintain, and extend. We need to fix that ASAP.

Cython is a well established Python-to-C transpiler that provides a much cleaner, automatic means of generating the same kind of Python C API code that's written by hand in Aesara. Here are some possible benefits to replacing our current C implementations with Cython-generated C code:

  • we could make more of our logic transparent to pure-Python readers
  • automatically benefit from updates and new features provided by Cython over time (e.g. Python C API version and capability updates)
  • use Cython's build and code caching features, which could have much better support for different platforms and environments
  • attempt automatic conversion of Python-only Op implementations, resulting in more C-only code (i.e. fewer calls to-and-from Python/C during graph evaluation)
  • C-level interactions with NumPy are much easier in Cython, so we might—for example—be able to generate C code for all the Subtensor*/indexing operations with a little bit of Cython (instead of our very limited C implementations for only certain types of indexing)

This general idea has been brought up in numerous different locations, so I'm creating this issue as a means of collecting all the relevant details, ideas, discussions, requirements, etc., into one place.

Related issues:

@brandonwillard brandonwillard added enhancement New feature or request help wanted Extra attention is needed question Further information is requested important refactor This issue involves refactoring C-backend labels Mar 2, 2021
@brandonwillard brandonwillard pinned this issue Mar 2, 2021
@twiecki
Copy link
Contributor

twiecki commented Mar 2, 2021

Totally agree. I suppose we first need some functionality to enable usage of Cython.

@brandonwillard brandonwillard changed the title Replace C-code generation and processing backend Replace C-code generation and compilation backend Mar 2, 2021
@brandonwillard
Copy link
Member Author

brandonwillard commented Mar 2, 2021

It's not actually that difficult to use Cython-generated code in Aesara right now. For instance, an Op can create its own thunk—via Op.make_thunk—that calls out to a Cython-generated extension. This is what the Scan Op does, and it's the approach used by the old example referenced in pymc-devs/pytensor#10.

My impression is that this approach isn't the best because it doesn't use Aesara's C-based thunk machinery. This machinery is assumedly faster than the corresponding pure Python machinery, perhaps due to reduced Python-to-C and C-to-Python overhead—among other things.

Aesara graph evaluation primer

For anyone who's not familiar with the idea of a "thunk" in Aesara, this paragraph might help.

Simply put, a "thunk" is an argumentless function that calls an Op's implementation code (either C or Python) with Aesara's input and output storage arrays (i.e. plain lists with entries for each graph node/Apply's inputs and outputs).

Here's a simple example:

inputs = [1, 2]
outputs = [None]

class SomeOp(Op):
    def perform(self, inputs, outputs):
        outputs[0] = inputs[0] + inputs[1]

def a_thunk(inputs=inputs, outputs=outputs):
    SomeOp().perform(inputs, outputs)

a_thunk()

# `outputs` should contain `3`

Those storage arrays make up the graph's memory model, and they're stored inside a thunk function's closure. When the thunk is evaluated those output arrays are populated with the computed values. A thunk is created for each node/Apply in a graph, and, when a node's output is used as the input to another node, the output storage array of the first node will be used as the input to the second.

Continuing from the previous example:

other_outputs = [None]

class SomeOtherOp(Op):
    def perform(self, inputs, outputs):
        outputs[0] = inputs[0]**2

def another_thunk(inputs=outputs, outputs=other_outputs):
    SomeOtherOp().perform(inputs, outputs)

# This thunk depends on the output of the previous thunk
another_thunk()

This allows Linkers to create thunks for each Op in a graph that can be evaluated very easily by the VM classes, then, by returning the contents of the output storage arrays that correspond to the desired output of a graph, we get the kind of results produced by aesara.function.

Here's what aesara.function produces—in a nutshell:

# Using the example thunks above, we can create a function 
# that computes the graph for `(a + b)**2`
def compiled_graph_fn(a, b):
    inputs[0] = a
    inputs[1] = b
    for thunk in [a_thunk, another_thunk]:
        thunk()
    return other_outputs[0]

The for-loop in that example function is the job of the VMs, and the Linkers walk a graph and create the thunks. aesara.function creates a Function object that simply orchestrates the use of those two.

How compiled C code is used

There are a few places where the C and Python thunks are clearly distinguished. In the CVM (aka lazylinker_c.CLazyLinker from the C extension end), which is generally used whenever the C toolchain is available, C thunks are treated specially (see here), which sets a variable that signals the use of a special CVM.c_call. There doesn't seem to be much to it, just a pointer to the thunk's C function and that function's data/arguments.

From the Python side (e.g. when graph evaluation is performed using the pure Python VM Stack), there's a _CThunk class that appears to do the same thing as the CVM.c_call within _CThunk.__call__. It uses the run_cthunk function that's implemented in C here and exposed to Python via the cutils_ext extension. Ultimately run_cthunk uses the same pointers in the same way as CVM.c_call.

From what I can tell, _CThunks are exclusively created by the CLinker, which is briefly used by COp.make_c_thunk (called from the standard entry point Op.make_thunk) to make its thunks. Aside from the questionable need for an entirely distinct CLinker class and/or object in this situation, it seems like the whole situation could be as simple as obtaining—and using—those pointers.

Regarding those thunk pointers, they seem to come from CLinker.cthunks_factory, which kicks off the C-code compilation process—the same one that we're considering replacing here (e.g. with Cython, or at least some use of distutils's compilation code). In Python, those thunk pointers are PyCapsule objects, and they can be easily created/accessed in Cython.

@brandonwillard
Copy link
Member Author

brandonwillard commented Mar 9, 2021

For anyone who wants to try this (e.g. @aseyboldt for #327), take a look at how COp creates C thunks. I think the relevant parts start here (i.e. CLinker.make_thunk), so we might need to jump into whatever happens there.

@twiecki
Copy link
Contributor

twiecki commented Mar 9, 2021

@brandonwillard
Copy link
Member Author

brandonwillard commented Mar 9, 2021

Actually, it looks like it might be as simple as creating an aesara.link.c.basic._CThunk object. In order to do that, we'll need valid _CThunk arguments for Cython/Numba-generated functions.

The self.__compile__(...) step is how we normally generate those arguments, but it goes through the irrelevant process of compiling str-derived C code and creating extensions. Regardless, the cthunk value returned by CLinker.__compile__ is a PyCapsule object, module is a module-type object, in_storage and out_storage are lists of aesara.link.basic.Containers, and error_storage is just a list of Nones.

The first two values (i.e. the PyModule and module objects) seem obtainable from Cython/Numba, so it looks like we'll only need to reproduce the storage and error array creation steps.

@aesara-devs aesara-devs locked and limited conversation to collaborators Apr 16, 2021
@brandonwillard brandonwillard unpinned this issue Apr 16, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
C-backend enhancement New feature or request help wanted Extra attention is needed important question Further information is requested refactor This issue involves refactoring
Projects
None yet
Development

No branches or pull requests

2 participants