OpenCL backend dev - questions/support

I’ll try to see if CPU version works. It would be excellent time saver.

1st of all it worked. Instead of implementing full multi_head_attension I could implement only much simpler transform_bias_rescale_qkv and not all vit_x_NN network work properly

It is still not perfect performance wise since I do one extra copy, but one I implement generic dlprim::core::pointwise_operation_broadcast option with support of strides and would improve some other performance problems with non-contiguous tensors (I have in convnext nets)

Most of dlprimitives code assumes contiguous, channel 1st tensors and in some cases it adds an extra copy - but this is fixable at least for basic pointwise broadcast/reduce operations.

My WIP example has some example of re-using the CPU implementation for metadata-only updates …

Ohhh I wish I noticed it some time ago. It would be major time saver because I spend lots of head scratching on some of these operators and couldn’t understand why they need special ops.

Thank you very much for the support!

1 Like