This is the second post of a series covering some of the more difficult implementation details encountered while supporting Dynamo in Python 3.11.
The first post can be read here: Supporting Dynamo in Python 3.11 - NULL.
This post covers challenges resulting from internal CPython changes to frame evaluation.
How are Dynamo-compiled frames evaluated?
Dynamo currently only works because of the existence of PEP 523. Interestingly, perhaps concerningly, parts of PEP 523 would have been removed, were it not for our intervention.
At a high level, PEP 523 allows third-party code to intercept and evaluate Python frames. Dynamo takes advantage of this by intercepting frames immediately before evaluation; analyzing the frame’s bytecode, which produces custom Python bytecode (this part is implemented in Python); and evaluating the resulting custom bytecode with CPython’s default frame evaluation.
The Python part of Dynamo (Python-Dynamo) expects as input a CPython evaluation frame, consisting of bytecode and an evaluation context (locals, globals, builtins etc.). Python-Dynamo then simulates the bytecode from the frame with the evaluation context, extracting an “FX graph”, a computation graph of torch
operations. It then passes the FX graph to a compiler, which produces a Python binding to optimized triton/C++/etc. code. Python-Dynamo then wraps the result with some additional bytecode (called “modified bytecode”) that calls the optimized binding and handles error cases, such as when the bytecode simulation encounters an unsupported operation (“graph break”, see the first post for more details). It is this modified bytecode that is returned by Python-Dynamo. (Technically, Python-Dynamo also returns a guard function so that the resulting modified bytecode can be cached, but we omit the details here.)
The main entrypoint for enabling Python-Dynamo modified bytecode to be run by the CPython interpreter is torch/csrc/dynamo/eval_frame.c:_custom_eval_frame
. It takes as input the Python frame to be evaluated and returns the result of evaluating that frame. In particular, _custom_eval_frame
does the following (with some details omitted, such as guards and skips):
- Call Python-Dynamo with the original frame to get modified bytecode (result is cached).
- Create a new Python evaluation frame object using the modified bytecode (called the “shadow frame”). We cannot simply overwrite the original frame since Python-Dynamo produces bytecode that has more locals, which requires changing the size of the frame.
- Copy over parts of the original frame’s evaluation context to the new frame (i.e. locals, free variables, cell variables).
- Evaluate the shadow frame using CPython’s default frame evaluation API,
_PyEval_EvalFrameDefault
.
Workarounds in Python 3.11
The first 3 steps of _custom_eval_frame
above were affected by Python 3.11 changes. Here, we record our necessary workarounds.
Step 1
The first step (calling Python-Dynamo with the original evaluation frame) was affected because the internal representation of the CPython evaluation frame changed. In Python 3.10, CPython used PyFrameObject
s for frame evaluation, which are PyObject
s, so they can be passed to Python-Dynamo without issue. In Python 3.11, CPython uses the interal C struct _PyInterpreterFrame
, which cannot be passed to Python-Dynamo, since it is not a PyObject
.
PyFrameObject
still exists though, and it even holds a reference to a _PyInterpreterFrame
(and vice versa). However, PyFrameObect
s are now laziliy instantiated - used mostly for debugging purposes. Given a _PyInterpreterFrame
, we attempted to create its corresponding PyFrameObject
, before evaluation, to pass it to Python-Dynamo. However, this leaves the CPython interpreter in a weird state, leading to assert fails and memory errors.
The solution is to create a new PyObject
type that holds a _PyInterpreterFrame
and provides getters and setters to mimic the old PyFrameObject
behavior (relevant code).
A potential future issue is if Python-Dynamo requires a value in PyFrameObject
that cannot be computed from _PyInterpreterFrame
. This is a problem since we would need to create a PyFrameObject
for the _PyInterpreterFrame
before evaluation, which cannot be done, as described before.
Step 2
The second step (creating the shadow frame) was affected because the way to create new evaluation frames changed. In Python 3.10, we used PyFrame_New
, which returned a PyFrameObject
. In Python 3.11, PyFrame_New
became a legacy API function. It still returns a PyFrameObject
, but as detailed earlier, using PyFrameObject
results in errors.
In order to create shadow frames, we mimic how CPython creates _PyInterpreterFrame
s when evaluating functions (relevant entrypoint). A lot of the functions that we need to call are not exposed by CPython, so we had to copy several of them over (relevant code). Thankfully, we did not have to copy over an extensive amount of code, and there were no instances where we had to copy the implementation of an internal object with changing state.
Step 3
The third step (copying over the evaluation context) was affected because the memory layout of _PyInterpreterFrame
s changed. More specifically, the layout of _PyInterpreterFrame
’s localsplus
changed. Before, the locals and the cell/free variables were completely separated in localsplus
. In 3.11, these can now overlap - in particular, a variable can be both a local and a cell variable. Our fix required writing a new method to copy over the localsplus
values (relevant code).