CUDA loops case study: code generation vs templates

It’s a great idea, and we should do it!