Clarification of PyTorch Quantization Flow Support (in pytorch and torchao)

jerryzh168 · March 14, 2025, 9:25pm

a) torchao quant can support backend specific int8 kernels, you can expose it through “layout” (for different packing format), an example is CPU layout for int4 weight only quantization: ao/test/integration/test_integration.py at f38c2722d953ea9352268f0f43f0889041423f27 · pytorch/ao · GitHub, see Quantization Overview — torchao 0.9 documentation for a more detailed explanation

b). yeah it can be extended to other ops as we work more on optimizations, ideally it’s driven by specific important model / use cases. let me know if you feel any model is bottlenecked by these ops and we can take a look. one op I have in mind is SPDA, and maybe moe next.

Topic		Replies	Views
Torch.ao.quantization Migration Plan	6	1244	January 28, 2026
PyTorch 2 Quantization, How it works?	1	598	June 24, 2024
Quantization in Pytorch	3	1671	February 24, 2025
Clarification regarding Quantization in ExecuTorch ExecuTorch	1	249	December 2, 2024
Minutes from Core maintainer meeting Aug 2023	0	325	February 16, 2024

Clarification of PyTorch Quantization Flow Support (in pytorch and torchao)

Related topics