TL;DR:
CUDAPluggableAllocator
does not have caching, does not support torch.cuda.memory_stats()
.
The default caching behavior:
- When an allocation fails, PyTorch will try to release unused memory and try it again.
- When memory is freed, PyTorch does not return it back to cuda, but will reserve the memory.