a) torchao quant can support backend specific int8 kernels, you can expose it through “layout” (for different packing format), an example is CPU layout for int4 weight only quantization: ao/test/integration/test_integration.py at f38c2722d953ea9352268f0f43f0889041423f27 · pytorch/ao · GitHub, see Quantization Overview — torchao 0.9 documentation for a more detailed explanation
b). yeah it can be extended to other ops as we work more on optimizations, ideally it’s driven by specific important model / use cases. let me know if you feel any model is bottlenecked by these ops and we can take a look. one op I have in mind is SPDA, and maybe moe next.