I am looking into compiling libtorch into a “forward-only“ version, e.g. dropping all the CUDA backward kernels. This would result in lower binary sizes for inference. Did anybody try something like this before?
Hey!
I’m afraid this is not something that exists nor will be easy to do.
The libtorch binary itself is very much monolithic and not built to do selective build from the ground up.
Executorch on the other hand is built for that from the ground up and has very good selective build if you can Export your model. The binary size will be 1000x smaller as well !
Cheers,
Alban