Summary of Issues Found in Adapting Models to use TorchScript

This is a Google Doc that summarizes and links the discussions found in the Adapting Models to use TorchScript and Getting them to Produce Fusions post.

I am linking a doc because the top post appears not to be edit-able after a certain amount of time.


We recently found a new issue that is also filed as Pytorch Issue 54040 where TorchScript’s Autodiff is not respecting the requires_grad option of a tensor when calculating gradients such that unused gradients are unnecessarily calculated. The summary was updated. This was seen, in particular, on the mask applied to multihead attention in NLP networks.

