I love the structured kernel approach and the separation (and larger availability) of the meta information!
One question that is half-related here is the vision for backwards.
Would we also use structured kernels for backwards? It’s in the future because I imagine a world where autodiff + JIT would be able to lower a generated backward to use inplace where it doesn’t need the intermediates.
One step further might be to also have meta-information relating to the backward (will it be native or from derivatives, which outputs will have gradients etc.), but this is likely out of scope here.
Both of these may be far in the future but as an implementor, “yes” or “no” for using structured kernels for backward ops is a question that I have.