OpenCL backend dev - questions/support

artyom-beilis · August 29, 2024, 1:45pm

I’ll try to see if CPU version works. It would be excellent time saver.

1st of all it worked. Instead of implementing full multi_head_attension I could implement only much simpler transform_bias_rescale_qkv and not all vit_x_NN network work properly

It is still not perfect performance wise since I do one extra copy, but one I implement generic dlprim::core::pointwise_operation_broadcast option with support of strides and would improve some other performance problems with non-contiguous tensors (I have in convnext nets)

Most of dlprimitives code assumes contiguous, channel 1st tensors and in some cases it adds an extra copy - but this is fixable at least for basic pointwise broadcast/reduce operations.

My WIP example has some example of re-using the CPU implementation for metadata-only updates …

Ohhh I wish I noticed it some time ago. It would be major time saver because I spend lots of head scratching on some of these operators and couldn’t understand why they need special ops.

Thank you very much for the support!

Topic		Replies	Views
Implementing OpenCL backend for pytorch hardware-backends	14	16554	March 1, 2024
OpenCL Backend - Important Updates hardware-backends	18	7461	May 23, 2025
Tensor_totype issues with device doesn't support double	7	170	May 27, 2025
Embrace tensor subclass as a Python device registration API hardware-backends	5	495	March 28, 2025
Aten::mul.out receives tensors from mixed devices hardware-backends	2	1262	October 16, 2021

OpenCL backend dev - questions/support

Related topics