I’m working on OpenCL backend for pytorch. I currently validate standard torchvision models, in forward and back-propogation. All nets but efficientnet_bX
I tested work. efficientnet_bX
give wrong results in backward computations.
Now when I had similar issues with forward propogation I copied then net and saved “checkpoints” in different places (after suspicious operations) till I found one that generated results that aren’t similar to cpu
.
I can look on gradients generated in parameters of layers like conv of linear but I can’t do it for layers that don’t have parameters.
How can I do this for backpropogation, since it is all automatic? In caffe for example with its static graph I could run net.backward(from,to)
and look into intermediate layers.
Is there anything that can help me extracting each and every value for CPU and my device?
Below example of runtest output where maximal difference and maximal output difference is comouted
Testing mnist_mlp
Accessing device #1:GeForce RTX 2060 SUPER on NVIDIA CUDA
Ok od=0.00000 md=0.00000
Testing mnist_cnn
Ok od=0.00000 md=0.00000
Testing mnist_bn
Ok od=0.00000 md=0.00000
Testing alexnet
Ok od=0.00002 md=0.00002
Testing resnet18
Ok od=0.00001 md=0.00001
Testing vgg16
Ok od=0.00002 md=0.00022
squeezenet1_0 is blacklisted
Testing densenet161
Ok od=0.00001 md=0.00274
Testing inception_v3
Ok od=0.00002 md=0.00002
googlenet is blacklisted
Testing shufflenet_v2_x1_0
Ok od=0.00002 md=0.00002
Testing mobilenet_v2
Ok od=0.00002 md=0.00051
Testing mobilenet_v3_large
Ok od=0.00003 md=0.00003
Testing mobilenet_v3_small
Ok od=0.00003 md=0.00003
Testing resnext50_32x4d
Ok od=0.00002 md=0.00711
Testing wide_resnet50_2
Ok od=0.00001 md=0.00130
Testing mnasnet1_0
Ok od=0.00004 md=0.00031
Testing efficientnet_b0
FAIL od=0.00001 md=1.65422
Testing efficientnet_b4
FAIL od=0.00001 md=0.27367
Testing regnet_y_400mf
Ok od=0.00001 md=0.00259