Clarification of PyTorch Quantization Flow Support (in pytorch and torchao)

Achiirua · February 28, 2025, 2:59pm

Hi @jerryzh168, thanks for the update!

As a Torch user in the embedded space, I’d like to ask a pair of things. Do you have plans to constrain the export feature in torchao quantization? As far I understand, you recommend to exclude the export in the torchao flow for speedup reasons. However, it’s still interesting to export models quantized with advanced techniques to deploy them in custom backends.

Additionally, it seems that during export with torch.export() the ops of the generated IR is dependent on the package we use. For example, we obtain prims ops when we export with torchao, but ATen (and Core ATen) ops with pt2e. I’ve read some discussions stating that’s possible to control how much an op is decomposed, but today it’s a bit opaque for the users. Do you intend to expose somehow the degree of decomposition to, for example, decompose the ATen ops to prim ops generated with pt2e, or viceversa?

I’m happy to help!

Topic		Replies	Views
Torch.ao.quantization Migration Plan	6	1251	January 28, 2026
PyTorch 2 Quantization, How it works?	1	599	June 24, 2024
Quantization in Pytorch	3	1675	February 24, 2025
Clarification regarding Quantization in ExecuTorch ExecuTorch	1	249	December 2, 2024
Minutes from Core maintainer meeting Aug 2023	0	325	February 16, 2024

Clarification of PyTorch Quantization Flow Support (in pytorch and torchao)

Related topics