Implementing OpenCL backend for pytorch

artyom-beilis · July 27, 2021, 11:49am

Thanks. I’ll read the blog posts. From first glance looks very interesting (and complicated)

My first DL framework was Caffe that I still like a lot due to its highly readable and easy to modify C++ code (and of course OpenCL support)

In any case, dispatcher and other technical things are complicated in terms of system but actually simple in comparison to optimized DL kernels.

For example I hadn’t found a single open source general purpose implementation of Winograd algorithm either in CUDA or OpenCL (ROCm’s are actually binary blows) and Intel ones are highly tighten to Intel architecture. Finally I found a parer in 2020 that described how GPU implementation of Winograd should look like.

Even GEMM based convolutons aren’t very good - clBlast implements one but its performance very poor (and implements only FWD propogation)

So complex is relative thing

Topic		Replies	Views
OpenCL Backend - Important Updates hardware-backends	18	7265	May 23, 2025
ROCm vs OpenCL/dlprimitives hardware-backends	0	344	August 5, 2024
Private use opencl device hardware-backends	7	1875	November 11, 2022
OpenCL backend dev - questions/support hardware-backends	4	320	August 29, 2024
Embrace tensor subclass as a Python device registration API hardware-backends	5	424	March 28, 2025

Implementing OpenCL backend for pytorch

Related topics