How to properly apply scaling to a NHWC tensor at the kernel level?


I am a new contributor to Pytorch. I recently pushed this Add NHWC support for group normalization by ZelboK · Pull Request #126635 · pytorch/pytorch · GitHub and I am actually not sure how I would implement this with tensor iterators and gpu kernels rather than making a custom kernel.

(Hopefully I got the implementation mostly right?) I’m pretty sure the mean, variance, and fusion is correct however I’m not sure about the actual normalization.