Preserve PyObject even when it goes dead by ezyang · Pull Request #56017 · pytorch/pytorch · GitHub has landed to master. This PR makes it so that we never deallocate the PyObject representing a Tensor, unless the Tensor itself is truly dead. This makes the behavior of
__dict__ on Tensor objects more predictable (no more “losing” this information if a Tensor becomes temporarily inaccessible from Python, but not from C++).
Although we have done our best to test our change, this is a core change to a very important part of PyTorch, so there is always risk involved. Based on the bugs we fixed while this patch is in development, there is one aspect of this patch which is most likely to break things. If a C++ tensor outlives the Python tensor, the C++ destructor will take out the Python GIL in order to deallocate the PyObject. This can cause deadlocks, if the destructor for a Tensor runs in a context where another thread that holds the GIL is blocked on the thread doing the destruction. It is generally a bad idea to block while holding onto locks, but in preparation of this patch we had to fix a few occurrences of this: https://github.com/pytorch/pytorch/pull/57029 https://github.com/pytorch/pytorch/pull/56817 . The codepaths that are most likely to get in trouble are related to distributed; however, I have already landed logic to report errors when those destructors block while holding the GIL Assert that GIL is not held in blocking destructors by ezyang · Pull Request #57030 · pytorch/pytorch · GitHub and audited most other sites. We may have missed something though!
Please let me know if you think you’re seeing anything new that looks related to this.