CUDACachingAllocator rounding pattern

david-macleod · June 3, 2023, 8:39pm

According to get_allocation_size there are three tiers of size, which affect how much we round up the request size when calling cudamalloc.

The comments suggest that the reason for rounding to 20 mb for 1 <= size < 10 is to help reduce fragmentation (makes sense), but then for size >= 10 mb we fall back to a different pattern such that a request for 11 mb results in a smaller cudamalloc assignment than 9 mb (see plot below)

I was just curious what the reason for this was

Topic		Replies	Views
Understanding the difference between the caching behavior of cuda caching allocator and pluggable allocator	6	356	January 17, 2025
FSDP & CUDACachingAllocator: an outsider newb perspective distributed	10	7721	December 13, 2024
CUDAGraphs in Pytorch 2.0 compiler	6	5156	November 20, 2024
Meta PyTorch Team 2025 H1 Roadmaps	17	5598	June 24, 2025
New contributor with decent CUDA experience	0	144	December 7, 2024

CUDACachingAllocator rounding pattern

Related topics