Implementing OpenCL backend for pytorch

artyom-beilis · October 10, 2021, 7:07am

Small updated: I implemented GPU memory caching + asynchronous execution and got performance results virtually identical for my static graph dlprimitives execution.

Now it works efficiently for all GPUs I tested AMD, 6600XT NVidia 960 and Intel GPUs 530.
Also I fixed pytorch benchmark that by accident didn’t include copy to gpu time and now run times on 960 are ~15ms on pytorch cuda/cudnn 960 and ~22ms on dlprimitives

Topic		Replies	Views
OpenCL Backend - Important Updates hardware-backends	18	7273	May 23, 2025
ROCm vs OpenCL/dlprimitives hardware-backends	0	345	August 5, 2024
Private use opencl device hardware-backends	7	1875	November 11, 2022
OpenCL backend dev - questions/support hardware-backends	4	321	August 29, 2024
Embrace tensor subclass as a Python device registration API hardware-backends	5	425	March 28, 2025

Implementing OpenCL backend for pytorch

Related topics