What is the relationship requirement among original bytecode, transformed bytecode, and bytecode returned by hooks in Dynamo?

In torch.compile, the original bytecode of user-defined function will be transformed by Dynamo. The pull request adds an API torch._dynamo.convert_frame.register_bytecode_hook to PyTorch, which enables hooks to log additional information of the bytecode and optionally to replace the transformed bytecode. Currently, the major usecase of this API is the depyf project for helping users to understand and adapt to torch.compile, and there might be more usecases in the future.

Replacing bytecode is very error-prone in Python, as it involves many implementation details of CPython interpreter.

This discussion thread aims to clarify what is the relationship requirement among these code objects. To be specific, we are talking about three code objects:

  1. Original code object, from user defined functions, dubbed as original_code.
  2. Transformed code object, generated by Dynamo, dubbed as transformed_code.
  3. Replaced code object, returned by bytecode hooks, dubbed as replaced_code.

The requirement is with regard to the following code attributes:

co_argcount
co_cellvars
co_code
co_consts
co_filename
co_firstlineno
co_flags
co_freevars
co_kwonlyargcount
co_lines
co_linetable
co_lnotab
co_name
co_names
co_nlocals
co_posonlyargcount
co_stacksize
co_varnames

I suppose we don’t need exactly the same attributes, as that would basically require the same code object.

By inspecting the commit history of https://github.com/pytorch/pytorch/blob/main/torch/csrc/dynamo/eval_frame.c , I suppose @jansel @anijain2305 @williamwen42 might be relevant to this discussion.

Some related discussion:

discussion on this PR seems to indicate that new code objects must have more local variables than the original code objects (i.e. more names in co_varnames).

I once met a strange failing test that fails PyTorch in Python 3.11 . After I make sure the replaced code objects have more local variables than the original code objects, the error disappears.

The Python interpreter seems to work as long as the new bytecode does not access a name that does not exist:

def f():
    a = 1
    b = 2
    return a + b

def g():
    a = 1
    b = 2
    c = 3
    return a + b + c

g.__code__ = f.__code__
print(g()) # outputs 3

In this example, we replace g’s bytecode with f’s bytecode, and the frame of g can execute f’s bytecode without any problem, even though the replaced bytecode has less local variables than the original bytecode.

f_localsplus of original_code needs to be a prefix of both transformed_code->f_localsplus and replaced_code->f_localsplus. Since we copy f_localsplus from the old frame to the new frame.

This means:

  1. The new code must have more locals than the old code
  2. And, they must be in the same order

f_localsplus contains: args, free/cell vars, local variables.

Technically the order of locals can change (since they are all just NULL when the function is called), but args/free/cell vars must be in the exact same positions as the original function.

Thanks, Jason!

According to your response and my understanding, I conclude with the following diagram:

To be clear, the sole purpose is to make sure new_frame can execute the transformed or replaced bytecode correctly. Therefore, ideally, we need to look up necessary variables used in new_frame, and copy them from old_frame.

There are four kinds of variables we need to consider:

  • arguments, passed by the caller function.
  • local variables, created during execution and destroyed after the frame finishes execution. They are NULL when the frame starts execution.
  • free variables, variables that are referenced by this frame but come from outer functions.
  • cell variables, variables that are inside this frame but referenced by inner functions. I suppose they are initially NULL and set by STORE_DEREF later.

Ideally, new_frame should look up variables it requires from old_frame. The current implementation tries to copy all the variables from old_frame to new_frame, that’s why @jansel says there are two constraints:

  1. The new code must have more locals than the old code
  2. And, they must be in the same order

A more natural solution would be:

  for i, name in enumerate(old_frame_names):
    name_to_idx[name] = i
  for i, name in enumerate(new_frame_names):
    if name not in name_to_idx:
      continue
    fastlocals_new[i] = fastlocals_old[name_to_idx[name]]

This might get rid of those constraints. But it incurs higher runtime cost.

  • cell variables, variables that are inside this frame but referenced by inner functions. I suppose they are initially NULL and set by STORE_DEREF later.

cell variables add a layer of indirection. So they won’t be NULL, they will be a cell containing NULL.


While I agree that adding a name_to_idx would make things more general, it is unnecessary because the order of vars in the old frame is a prefix of the new frame by construction in dynamo (since dynamo creates the new frame by copying and modifying the old frame).

This code is on the hot path (every function call), so we care a lot about performance here.

1 Like

Agree performance is critical here. I created a PR to skip copying local variables, which can not only bring speedup but also relax these two constraints.

And I will try to follow these constraints to be compatible to previous PyTorch versions. This discussion thread is left for future reference in case anyone runs into the problem.

1 Like

Update: after this PR, the only constraint is that original_code/transformed_code/replaced_code have the same args/cell variables/free variables. There are no constraints on newly introduced local variables.

However, PyTorch prior to this PR would have these constraints:

  1. The new code must have more locals than the old code
  2. And, they must be in the same order
  3. args/cell variables/free variables must be exactly the same.