Will the decomposition be done in the framework before the ops are dispatched to backend, or should the backend handle it, maybe via a torch dispatch based decomposition?
There is also a potential performance impact, as the decomposition in torch.compile happens only during the graph compile time but in the eager flow, the decomposition will happen during every op execution.