Implementing OpenCL backend for pytorch

artyom-beilis · October 17, 2021, 9:16pm

And now some more progress:

And performance:

Benchmarks

All benchmarks done on gtx 960/4G to get comparison to native cuda speed.

Test includes copy of data to/from device and forward calculations

Framework	alexnet	resnet18	resnet50	vgg16	mobilenet
pytorch/cuda	15.253	38.745	114.348	169.038	46.110
pytorch/opencl	22.989	50.272	167.050	258.751	82.044
dlprimitives	22.688	49.193	158.789	238.802	82.080
keras/tf2-cuda	29.104	74.215	161.704	158.084	88.851
keras/plaidml	43.004	91.533	-	-	45.693

Train includes - io to/from device, zero gadients, forward, backward and optimizer update step. Adam used as optimizer.

Framework	alexnet	resnet18	resnet50	vgg16	mobilenet
pytorch/cuda	107.108	129.456	388.951	N/A	177.434
pytorch/opencl	147.814	213.319	651.216	N/A	382.590
dlprimitives	106.033	198.092	605.541	1107.756	344.599
keras/tf2-cuda	90.005	183.447	501.362	550.063	322.416
keras/plaidml	222.166	507.116	-	-	571.438

Looks very nice

Topic		Replies	Views
OpenCL Backend - Important Updates hardware-backends	18	7176	May 23, 2025
ROCm vs OpenCL/dlprimitives hardware-backends	0	334	August 5, 2024
OpenCL backend dev - questions/support hardware-backends	4	306	August 29, 2024
TorchInductor Update 6: CPU backend performance update and new features in PyTorch 2.1 compiler	0	1975	September 22, 2023
TorchInductor Update 7: key optimizations with CPU backend in PyTorch 2.2 release compiler	4	929	March 8, 2024