I had implemented recently a basic set of deep learning operations and initial training/inference library.
Also it is fairly new it already outperforms PlaidML and Caffe/OpenCL by 150-200% in tested networks (alexnet,resnet, vgg,mobilenet) in both training and inference and AMD and nVidia GPUS. It also gives ~50% to 70% performance of native cuda+cudnn/hip+miopen on amd gpus.
I want to start working on OpenCL (out-of-tree) backend for PyTorch.
I implemented both GEMM and Winograd based convolutions and other multiple DL operators (also it is only beggining)
Unlike ROCm it runs on Windows as well and should support RDNA and APUs (also I handn’t tested it)
Is there any initial implementation of OpenCL backend or some kind of template backend so I can start more easily with the task?
I really think having OpenCL support is valuable and the fact that only project that supported it Caffe - dead and PlaidML killed by killing multi-backend Keras.