Clarification of PyTorch Quantization Flow Support (in pytorch and torchao)

jerryzh168 · May 9, 2025, 8:41pm

what’s the focus of each

the post is trying to clarify that, to summarize pt2e is mostly for static quantization use cases, and torchao is for others.

One of my random guess is: is it true that if I want to deploy on edge device for inference, I will need to export, and makes pt2e the better choice? But for LLM training torchao is better because of more modern feature?

it’s true that edge use cases is mostly pt2e quant so far, but we also have edge + LLM use cases that’s using torchao (quantize_) API as well, so it depends on the type of quantization you’d like to do, e.g. static, v.s. dyanmic, weight only, or more advanced ones like AWQ, GPTQ, smoothquant etc.

And thanks for you pt2e example! looks like it has more feature than the torch.ao. Sounds like no matter I choose p2te or torchao quant method, I could think torch.ao as retired?

that’s true, we are depreacting torch.ao.quantization, more details in Torch.ao.quantization Migration Plan

Topic		Replies	Views
Torch.ao.quantization Migration Plan	2	352	March 14, 2025
PyTorch 2 Quantization, How it works?	1	372	June 24, 2024
Minutes from Core maintainer meeting Aug 2023	0	281	February 16, 2024
Plans about debugging tools for pt2e? deployment	1	499	October 31, 2023
Quantization in Pytorch	3	1278	February 24, 2025

Clarification of PyTorch Quantization Flow Support (in pytorch and torchao)

Related topics