Recently, I’ve been working on resolving some numerical precision alignment issues. Take the division operator as an example; the computation yields different results on CPU and CUDA or when expressed using different syntax, as seen in the attached screenshot.
I’m endeavoring to uncover the underlying reasons through various methods, and the first thing that comes to mind is to review the C++ source code or CUDA source code. However, I’ve encountered challenges in understanding the intricacies of PyTorch’s C++ code and locating the corresponding source code. Is there anyone who can help me understand how to learn PyTorch’s C++ source code, particularly how to find the implementation details of C++ operators?
@shuokay the main reason finding the kernel you’re looking for from there is a pain is because op.call(self, other) uses the pytorch dispatcher, and dynamally dispatches to the right kernel (more on the dispatcher here: Let’s talk about the PyTorch dispatcher : ezyang’s blog)
@bdhirsh Thank you very much for your reply, it was really helpful. Based on your explanation, I have “debugged” the PyTorch code step by step, and I think I have a deeper understanding of the implementation and dispatch of PyTorch operators now.
native_functions.yaml + grepping for the name there should take you to the function. As Brian mentioned, this works for any function implemented in C++/CUDA