The "Ideal" PyTorch FLOP Counter (with __torch_dispatch__)

artyom-beilis · October 27, 2022, 7:36pm

Interesting. Do you measure memory bandwidth as well?

Some operations like convolution or gemm are mostly flops bounded, however many operations are actually memory bandwidth bounded for example batch normalization, activation etc. I noticed that sometimes GPU like 2060S that has less flops than 1080 can do resnet faster due to large difference in memory speed.

Regarding convolution, do you take in account that if you run for example Wingorad or FFT convolution can actually do it in less FLOPS than “direct”/GEMM based convolution?

Topic		Replies	Views
What (and Why) is __torch_dispatch__? frontend API	3	14601	July 2, 2024
Where do the 2000+ PyTorch operators come from?: More than you wanted to know compiler	13	14313	November 15, 2024
State of PyTorch core: September 2021 edition frontend API	1	9405	September 21, 2021
Estimate theoritical FLOPs of backward pass of a DNN performance	1	1101	April 16, 2023
How to read the autograd codebase frontend API	1	2901	October 26, 2021

The "Ideal" PyTorch FLOP Counter (with __torch_dispatch__)

Related topics