I recently started looking at quantisation in Pytorch. I’m interested in it, because I want to quantise an LLM (like Llama) without using external libraries like GGML or AutoGPTQ, simply because they do not seem stable enough to be included in a production stack.
I’ve read this (Quantization — PyTorch 2.1 documentation) and then tried to follow this tutorial (Dynamic Quantization — PyTorch Tutorials 2.1.1+cu121 documentation), but it does not work on my system. I get an
Illegal instruction (core dumped) error. I’m using
I’m writing here, because I’m interested in the general state of quantisation in Pytorch. This seems to be a very hot topic right now and many people would be interested in using it. Is anyone actively working on it and if yes, what’s the current scope? This is something I’m strongly interested in and I could contribute – I have an applied maths background, so I think I would be a good fit