With respect to PyTorch frontend APIs, I can come up with three sets of APIs:
nn.module API, such as torch.nn.Dropout
nn.functional API, such as torch.nn.functional.dropout
aten ops, such as torch.ops.aten.dropout.default
What is the relationship among these APIs?
Here is what I think, and please correct me if I’m wrong:
aten ops are the implementation of kernels
nn.functional APIs are functional, they wrap around aten ops, add some sanity checks against arguments
nn.module APIs contain state, and pass the state to stateless functional APIs
And when we step into the PyTorch compiler, the contrary is eager mode. Then which are considered as “eager mode”?
When we switch from eager mode to the PyTorch compiler with torch.compile(backend="eager"), which level of APIs does it trace? Does it trace those wrapper logics in nn.module and nn.functional?
Furthermore, I find that even the meaning of aten ops is not clear.
I can get all the aten ops and overloads by:
aten_ops = []
aten_ops_with_overloads = []
for k in dir(torch.ops.aten):
possible_op = getattr(torch.ops.aten, k)
if hasattr(possible_op, "overloads"):
aten_ops.append(possible_op)
for ovl in possible_op.overloads():
aten_ops_with_overloads.append(getattr(possible_op, ovl))
Clearly there will be another layer of logic for dispatching an operator to one of its overload. Does Dynamo also deal with these dispatch logic?
I was expecting that, as we go from eager backend to aot_eager and further to inductor, Dynamo will introduce more and more guards, but I find that guards of eager are the same of aot_eager. That might mean that dispatching logic is not pushed to Dynamo guards.
I think you’re actually missing a 4th layer: torch.dropout in your example above.
In order of lowest level to higher level:
torch.ops.* is a raw binding to our c++ dispatcher-based API. It contains all the “native” ops including the ones dynamically registered via torch library (both from python and c++). All features working at the dispatcher level will see these ops (or a particular subset of them): torch_dispatch classes and modes, aotautograd, inductor, etc
torch.* is our main python API, it is either written in python or a direct binding to some c++ ops. This is where torch_function happens, dynamo tracing and a lot of the fx tracing.
torch.nn.functional.* is a functional API for nn, it is usually logic wrapping a cal to torch.*.
torch.nn.Module is the stateful Module, it is holding onto state (params, buffers) and calls into functional or torch.* functions.
It’s strange that torch.dropout has no doc string, but torch.nn.functional.dropout does, which makes torch.nn.functional.dropout more formal and user-facing. That’s why I ignored torch.dropout
With respect to Dynamo tracing, it seems it just records what it sees:
Nothing special, most likely an oversight when dropout was added as a native op before we added the CI that ensures that all functions are properly documented