Implementing OpenCL backend for pytorch

They both want to create vendor lock in on their gpus.

Yes and no. I assume both Intel and AMD try to get most valuable solution for minimal money.

AMD went down the road of implementing HIP as Cuda replacement (huge mistake IMHO) but at least they released MIOpen OpenCL version. But unfortunately it isn’t really open-source since some kernels just go in binary formats. Additionally it is limited to Linux/ROCm only.

Intel has they OpenDNN and at some point I wanted to integrate it/use it in my backend but I discovered that my own kernels I have written work actually faster than their own kernels.

Why? Because they decided to optimize their code for Channel last only while PyTorch, ONNX and many other frameworks actually use channel first: Channel First Convolution Performance on OpenCL driver · Issue #1194 · oneapi-src/oneDNN · GitHub

I don’t now how much of it ignorance and how much of it try to do what possible with very limited resources (and yes, neither AMD nor Intel have team/investment even comparable to what nVidia does)

And most of industry around just does not care and lives with what nVidia has to propose because other alternatives are far from competitive.

I hope community can join the effort. Because building it all is really tough. Too many cases, kernels and other things.