Clarification of PyTorch Quantization Flow Support (in pytorch and torchao)

Hi @jerryzh168, thanks for the update!

As a Torch user in the embedded space, I’d like to ask a pair of things. Do you have plans to constrain the export feature in torchao quantization? As far I understand, you recommend to exclude the export in the torchao flow for speedup reasons. However, it’s still interesting to export models quantized with advanced techniques to deploy them in custom backends.

Additionally, it seems that during export with torch.export() the ops of the generated IR is dependent on the package we use. For example, we obtain prims ops when we export with torchao, but ATen (and Core ATen) ops with pt2e. I’ve read some discussions stating that’s possible to control how much an op is decomposed, but today it’s a bit opaque for the users. Do you intend to expose somehow the degree of decomposition to, for example, decompose the ATen ops to prim ops generated with pt2e, or viceversa?

I’m happy to help!