ROCm vs OpenCL/dlprimitives

artyom-beilis · August 5, 2024, 6:01am

I finally managed to upgrade my PC now running with Ubuntu 24.04, so I could install properly ROCm 6.1 and test out of box pytorch 2.4 rocm build. It was (almost) straight forward *

GPU AMD rx6600xt 8GB, I still compared to pytorch 1.13 for OpenCL since I hadn’t completed support of 2.4 in pytorch/opencl backend. For ROCM I used official 2.4 build

Training

time in ms per batch

Training	batch size	rocm/hip	opencl	Raito %
alexnet	64	57.848	74.965	77.2
resnet18	64	146.917	238.581	61.6
resnet50	32	266.441	358.45	74.3
vgg16	16	206.312	342.292	60.3
densenet161	16	296.807	490.319	60.5
mobilenet_v2	32	157.476	198.891	79.2
mobilenet_v3_small	64	92.506	123.889	74.7
mobilenet_v3_large	64	286.795	325.736	88.0
resnext50_32x4d	32	336.464	491.016	68.5
wide_resnet50_2	32	466.841	644.114	72.5
mnasnet1_0	32	159.97	167.829	95.3
efficientnet_b0	32	205.69	306.328	67.1
regnet_y_400mf	64	171.691	245.65	69.9
convnext_small	16	337.252	591.211	57.0

Average				71.9

Inference

Inference the batch size is always 64, time in ms per batch.

Inference batch=64	rocm/hip	opencl	Ratio %
convnext_small	476.371	602.858	79.0
alexnet	24.564	25.866	95.0
resnet18	41.478	59.095	70.2
resnet50	165.507	196.455	84.2
vgg16	205.215	309.509	66.3
densenet161	409.825	414.051	99.0
inception_v3	90.632	131.78	68.8
mobilenet_v2	77.652	93.449	83.1
mobilenet_v3_small	22.17	25.647	86.4
mobilenet_v3_large	63.12	70.016	90.2
resnext50_32x4d	245.001	274.578	89.2
wide_resnet50_2	319.019	400.626	79.6
mnasnet1_0	74.205	74.835	99.2
efficientnet_b0	104.285	114.732	90.9
efficientnet_b4	302.771	276.257	109.6
regnet_y_400mf	43.253	56.814	76.1

Average			85.4

Summary

Basically OpenCL performance for my dlprimitives backend is also lower but still gives very good performance, considering that it does not require ROCm infrastructure and thus isn’t limited to Linux only and to very specific devices only.

*) I needed to add environment variable export HSA_OVERRIDE_GFX_VERSION=10.3.0 since officially my rt 6600xt/gfx1032 is not supported so I needed to override it with architecture of 1030

Topic		Replies	Views
Implementing OpenCL backend for pytorch hardware-backends	14	16675	March 1, 2024
OpenCL Backend - Important Updates hardware-backends	18	7642	May 23, 2025
Private use opencl device hardware-backends	7	1963	November 11, 2022
Comparing the performance of 0.4.1 and master performance	0	2365	February 9, 2021
PyTorch 1.9 dev release notes release/packaging	0	1368	June 17, 2021

ROCm vs OpenCL/dlprimitives

Training

Inference

Summary

Related topics