Using Nsight Systems to profile GPU workload

Hey folks, thanks for starting this thread. It proofed very helpful in trying to profile my application. I am currently following the PyTorch lightning guide: Find bottlenecks in your code (intermediate) — PyTorch Lightning 2.0.0 documentation and use nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s none --capture-range-end stop --capture-range=cudaProfilerApi --cudabacktrace=true -x true poetry run python main_graph.py as the command to collect the emitted information. However, I am getting a 300MB file just when doing one step of training and a 1.7GB file doing thirty steps. Can anyone give me a hint if this is due to a huge misconfiguration on my side, or owed to the fact of using PyTorch lighting and PyTorch geometric?
Any input would be greatly appreciated :hugs: