Using Nsight Systems to profile GPU workload

jendrikjoe · March 29, 2023, 12:29pm

Hey folks, thanks for starting this thread. It proofed very helpful in trying to profile my application. I am currently following the PyTorch lightning guide: Find bottlenecks in your code (intermediate) — PyTorch Lightning 2.0.0 documentation and use nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s none --capture-range-end stop --capture-range=cudaProfilerApi --cudabacktrace=true -x true poetry run python main_graph.py as the command to collect the emitted information. However, I am getting a 300MB file just when doing one step of training and a 1.7GB file doing thirty steps. Can anyone give me a hint if this is due to a huge misconfiguration on my side, or owed to the fact of using PyTorch lighting and PyTorch geometric?
Any input would be greatly appreciated

Topic		Replies	Views
How profiling Pytorch Using Nsight Compute? NVIDIA CUDA	2	785	October 16, 2024
[RFC] Performance profiling at scale with detailed NVTX annotations compiler	0	541	July 10, 2024
Anomalous time gaps observed when using CUDA kernels in PyTorch	1	276	August 5, 2024
Profiling torch.compile compiler	0	278	April 10, 2025
Fast combined C++/Python/TorchScript/Inductor tracebacks performance	4	3230	August 17, 2023

Using Nsight Systems to profile GPU workload

Related topics