Agree with you on the sigmoid
opinion. A pure machine learning researcher might be excited when he discovers the gradient formula z * (1 - z)
, but he cannot understand that the bottleneck is the memory access
1 Like