A pure Python implementation of roi_align that looks just like its CUDA kernel

The final, optimized kernel, can be found here: Add deterministic, pure-Python roi_align implementation by ezyang · Pull Request #7587 · pytorch/vision · GitHub