TL;DR:
CUDAPluggableAllocator does not have caching, does not support torch.cuda.memory_stats() .
The default caching behavior:
- When an allocation fails, PyTorch will try to release unused memory and try it again.
- When memory is freed, PyTorch does not return it back to cuda, but will reserve the memory.