Native-ops DSL support in PyT core
tl; dr: We have merged support for writing python- and DSL-based native ops into PyTorch - you can now start using your favorite DSLs to write new implementations of native operators for inclusion in core.
Why?
DSLs like triton, helion, cuteDSL and others have become powerful tools for developers to write performant GPU (and CPU) code without having to go all the way to hand-tuned C++ or CUDA. They provide great developer experience, without sacrificing either code readability or performance.
Notably, Flash-Attention-v4, a key optimization for modern LLM models, is implemented using cuteDSL, and after a successful integration into core, the decision was made to formalize a) what’s expected from DSL-based native ops, b) how they should be added into core, and c) what expectations we have of them and how they interact with other pytorch systems.
We strongly suspect that some of the most performant implementations of important operations will start to be implemented to be implemented in DSLs, and with these changes it gives us the option, and a clear path if desired, to inclusion in PyTorch core.
How?
PR 176280 adds the torch/_native directory, containing general utilities to be shared across all DSL runtimes (i.e. runtime-availability and versioning), along with DSL-specific utilities, triton_utils.py and cutedsl_utils.py. These provide APIs to:
- Determine if the DSL runtime is a) installed, and b) of a reasonable version
- Register implementations when appropriate to override native ops, either for a subset of functionality, or for all cases
We are enforcing that all python native ops are registered through PyTorch’s dispatcher. This immediately means that ops are tied into all dispatch logic, notably (but not limited to!) autograd without requiring further developer intervention.
There are a few smaller restrictions, (see link), but beyond those, PyTorch is now ready for DSLs to start contributing to native operators.
Further, the torch/_native/ops directory has been created to contain all our DSL ops, and will soon fill with all the DSL-based code that’s been written, but didn’t have a path to living within core.
Further details and examples are located at torch/_native/README.md
What’s Next?
Start using it! If you’re thinking of updating or adding on to an op, you have a new way to do this if the pros are worth it to you!
There’s a few things immediately on our list:
- Operators
- Start writing implementations using the framework.
- Support more DSLs
- Helion / cuTile / anything else
- AOT Compilation
- Compiled objects for python-less environments
- Custom Ops
- Don’t override an existing op, create an entirely new one. Currently hampered by performance concerns

- Don’t override an existing op, create an entirely new one. Currently hampered by performance concerns