Understanding the difference between the caching behavior of cuda caching allocator and pluggable allocator

youkaichao · January 16, 2025, 9:30am

TL;DR:

CUDAPluggableAllocator does not have caching, does not support torch.cuda.memory_stats() .

The default caching behavior:

When an allocation fails, PyTorch will try to release unused memory and try it again.
When memory is freed, PyTorch does not return it back to cuda, but will reserve the memory.

Topic		Replies	Views
FSDP & CUDACachingAllocator: an outsider newb perspective distributed	10	8489	December 13, 2024
Meta PyTorch Team 2025 H1 Roadmaps	17	6305	June 24, 2025
Strange Increase of non-torch memory for unexpected functions	3	152	January 17, 2025
Is it possible to capture all operators errors?	1	774	March 2, 2021
How to measure memory usage from your model without running it? frontend API	5	3220	July 23, 2024