CUDACachingAllocator rounding pattern

According to get_allocation_size there are three tiers of size, which affect how much we round up the request size when calling cudamalloc.

The comments suggest that the reason for rounding to 20 mb for 1 <= size < 10 is to help reduce fragmentation (makes sense), but then for size >= 10 mb we fall back to a different pattern such that a request for 11 mb results in a smaller cudamalloc assignment than 9 mb (see plot below)

I was just curious what the reason for this was