Goal
The goal for the doc is to lay out the plan for deprecating and migrating quantization flows in torch.ao.quantization.
Note: This is a follow up to Clarification of PyTorch Quantization Flow Support (in pytorch and torchao) to clarify our migration plan for torch.ao.quantization
.
What is in torch.ao.quantization
Flow | Release Status | Features | Backends | Note |
---|---|---|---|---|
Eager Mode Quantization | beta | post training static, dynamic and weight only quantization, and quantization aware training (for static quantization), and numeric debugging tool. | x86 (fbgemm) and ARM CPU (qnnpack) | Quantized operators are using quantized Tensor in C++, that we plan to deprecate |
TorchScript Graph Mode Quantization | prototype | post training static and dynamic quantization | x86 (fbgemm) and ARM CPU (qnnpack) | Quantized operators are using quantized Tensor in C++, that we plan to deprecate |
FX Graph Mode Quantization | prototype | Post training static, dynamic, weight only, QAT, numeric suite | X86 (fbgemm/onednn) and ARM CPU (qnnpack/xnnpack) | Quantized operators are using quantized Tensor in C++, that we plan to deprecate |
PT2E Quantization | prototype | Post Training static, dynamic, weight only, QAT, numeric debugger | X86 (onednn), ARM CPU (xnnpack), and many other mobile devices (boltnn, qualcomm, apple, turing, jarvis etc.) | Using pytorch native Tensors |
Flow Support in 2024
For some data points in terms of support, in 2024,
-
We have fixed the following issues for QAT and PTQ, QAT is onboarding new customers in H1 2024, most of the fixes are for supporting new use cases, PTQ fixes are mostly bug fixes or making things more general.
-
QAT
-
PTQ
- [quant] Add error check for input_edge annotation by jerryzh168 ยท Pull Request #121536 ยท pytorch/pytorch ยท GitHub
- [quant][pt2e] Add support for conv transpose + bn + {relu} weights fusion in PTQ by jerryzh168 ยท Pull Request #122046 ยท pytorch/pytorch ยท GitHub
- Fix attr check for quantization spec by jerryzh168 ยท Pull Request #135736 ยท pytorch/pytorch ยท GitHub
-
-
We did not receive or fix any issues related to fx, eager or torchscript quantization as far as I know
Proposed Support Status
Overall I think we can have the following two statuses:
- Long Term Support
- We commit to support the flow long term
- We commit to fulfilling important feature request from other teams
- We commit to bug fixes
- Phasing Out
- We wonโt add new features
- We only commit to critical bug fixes
Proposed Action Items
For PT2E Quantization, I think it would be better if we move the the implementation to torchao.
For other workflows, I think we can keep them in pytorch for now, we can revisit the plan for deleting code if the usage drops to a certain point.
In terms of how we do the migration, what we agreed on in torchao meeting is the following:
-
For code that is used by eager and fx mode quantization like observers and fake_quant modules, we can keep these in pytorch/pytorch and import them from torchao
-
[1-2 weeks] For pt2e flow related code, we plan to duplicate them in torchao repository, new development with happen in torchao repository and older code in pytorch is kept for BC purposes
-
After we replicate pt2e flow code in torchao, weโll also ask people to migrate to torchao APIs
- [2 weeks] Internally, torchao team will take care of changing API imports in fbcode
- [2 weeks] Externally, we can add a warning in torch.ao.quantization saying this will be deprecated soon, and potentially delete the pt2e code in 1-2 releases, weโll add deprecation warning for all other workflows as well that have โPhase Outโ support status
- [not related to migration] We can also have new development such as adding groupwise observer in torchao after we have duplicated the pt2e flow code in torchao
We can target the above to be done by the end of H1 2025.