How does torch.compile work with autograd?

Agree with you on the sigmoid opinion. A pure machine learning researcher might be excited when he discovers the gradient formula z * (1 - z), but he cannot understand that the bottleneck is the memory access :smile:

1 Like