I am currently trying to profile the evaluation of RT-DETR github using Nsight Comput. However, the following problems occur. RT-DETR : RT-DETR/rtdetr_pytorch at main · lyuwenyu/RT-DETR (github.com)
ncu -o ./rt_deter_profile --target-processes all --replay-mode range torchrun --nproc_per_node=2 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml -r rtdetr_r50vd_6x_coco_from_paddle.pth --test-only
==WARNING== Please consult the documentation for current range-based replay mode
limitations and requirements.
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
==PROF== Connected to process 596 (/home/oongjoon/anaconda3/envs/RT-DETR/bin/python3.11)
==PROF== Connected to process 595 (/home/oongjoon/anaconda3/envs/RT-DETR/bin/python3.11)
Initialized distributed mode...
Load PResNet50 state_dict
loading annotations into memory...
Done (t=0.39s)
creating index...
index created!
resume from rtdetr_r50vd_6x_coco_from_paddle.pth
Loading ema.state_dict
Test: [ 0/313] eta: 0:11:21 time: 2.1777 data: 0.7534 max mem: 1634
Test: [ 10/313] eta: 0:02:40 time: 0.5302 data: 0.0928 max mem: 1643
Test: [ 20/313] eta: 0:02:12 time: 0.3661 data: 0.0249 max mem: 1643
Test: [ 30/313] eta: 0:01:59 time: 0.3621 data: 0.0230 max mem: 1643
Test: [ 40/313] eta: 0:01:52 time: 0.3692 data: 0.0232 max mem: 1643
Test: [ 50/313] eta: 0:01:51 time: 0.4246 data: 0.0242 max mem: 1643
Test: [ 60/313] eta: 0:01:44 time: 0.4171 data: 0.0242 max mem: 1643
Test: [ 70/313] eta: 0:01:38 time: 0.3553 data: 0.0229 max mem: 1643
Test: [ 80/313] eta: 0:01:32 time: 0.3541 data: 0.0223 max mem: 1643
Test: [ 90/313] eta: 0:01:28 time: 0.3635 data: 0.0227 max mem: 1643
Test: [100/313] eta: 0:01:23 time: 0.3622 data: 0.0228 max mem: 1643
Test: [110/313] eta: 0:01:19 time: 0.3650 data: 0.0235 max mem: 1643
Test: [120/313] eta: 0:01:14 time: 0.3575 data: 0.0225 max mem: 1643
Test: [130/313] eta: 0:01:10 time: 0.3579 data: 0.0214 max mem: 1643
Test: [140/313] eta: 0:01:07 time: 0.4284 data: 0.0214 max mem: 1643
Test: [150/313] eta: 0:01:03 time: 0.4201 data: 0.0208 max mem: 1643
Test: [160/313] eta: 0:00:59 time: 0.3504 data: 0.0202 max mem: 1643
Test: [170/313] eta: 0:00:54 time: 0.3394 data: 0.0199 max mem: 1643
Test: [180/313] eta: 0:00:50 time: 0.3483 data: 0.0217 max mem: 1643
Test: [190/313] eta: 0:00:46 time: 0.3542 data: 0.0229 max mem: 1643
Test: [200/313] eta: 0:00:42 time: 0.3574 data: 0.0228 max mem: 1643
Test: [210/313] eta: 0:00:39 time: 0.3616 data: 0.0225 max mem: 1643
Test: [220/313] eta: 0:00:35 time: 0.3638 data: 0.0220 max mem: 1643
Test: [230/313] eta: 0:00:31 time: 0.3619 data: 0.0216 max mem: 1643
Test: [240/313] eta: 0:00:27 time: 0.4185 data: 0.0221 max mem: 1643
Test: [250/313] eta: 0:00:23 time: 0.4127 data: 0.0221 max mem: 1643
Test: [260/313] eta: 0:00:20 time: 0.3719 data: 0.0217 max mem: 1643
Test: [270/313] eta: 0:00:16 time: 0.4079 data: 0.0223 max mem: 1643
Test: [280/313] eta: 0:00:12 time: 0.3976 data: 0.0227 max mem: 1643
Test: [290/313] eta: 0:00:08 time: 0.3948 data: 0.0220 max mem: 1643
Test: [300/313] eta: 0:00:04 time: 0.4103 data: 0.0216 max mem: 1643
Test: [310/313] eta: 0:00:01 time: 0.4575 data: 0.0215 max mem: 1643
Test: [312/313] eta: 0:00:00 time: 0.5282 data: 0.0211 max mem: 1643
Test: Total time: 0:02:02 (0.3922 s / it)
Averaged stats:
Accumulating evaluation results...
DONE (t=10.45s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.531
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.712
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.577
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.347
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.577
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.701
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.390
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.655
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.721
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.547
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.765
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.881
==PROF== Disconnected from process 595
==PROF== Disconnected from process 596
==WARNING== No ranges were profiled.
The kernel is not profiled, with the output ==WARNING== No ranges were profiled.
This is the driver version via nvidia-smi.
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 561.09 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A |
| 0% 62C P8 18W / 420W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 On | 00000000:02:00.0 Off | N/A |
| 0% 43C P8 19W / 420W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Nsight Compute versions are as follows:
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2024 NVIDIA Corporation
Version 2024.3.2.0 (build 34861637) (public-release)
torchversion : 2.0.1+cu117