NVIDIA Nsight Systems Performance Tool Wrapper

From RidgeRun Developer Wiki

Follow Us On Twitter LinkedIn Email Share this page




Previous: Tools/_CUDA_Profiler Index Next: Tools/ Computational Budget Tool






Nsight Ssytems Profiler

NVIDIA Nsight Systemsn tool is able to profile the usage of GPU and CPU. As example on how the user can analyze this data. The user needs to have an application using CUDA memory in some way such as using NVIDIA's GStreamer plugins and the following command.

sudo nsys profile -t osrt,cuda,nvtx -s process-tree --cpuctxsw=process-tree --gpu-metrics-devices all --cuda-memory-usage true --stats true  -o report <APPLICATION>

The output generated will be a .report and .sqlite file. The latter one is used for extracting the CPU and GPU usage.

Extracting the GPU and CPU usage

In the following repository you can find a script capable of extracting this values.

1.: Clone the repository

git clone https://gitlab.ridgerun.com/open/performance_wrapper.git

2.: Run the nsys_sqlite_perf.sh with the location of the sqlite file previously generated.

./nsys_sqlite_perf.sh /path/to/out_report.sqlite --top 10

Analyzing the output

The output looks like the following:

The total CPU% usage is based on the CPU cycles. In the example below, the process with PID 5965 uses approximately 76.19%.

===== CPU (from COMPOSITE_EVENTS.cpuCycles) =====
Top PIDs by CPU%:
pid   cpu_util_percent
5965  76.19
5969  4.17
5931  2.38
5935  2.38
5939  2.38
5943  2.38
5975  2.38
5981  2.38
5949  1.79
5955  1.79

Next, the output shows the CPU% usage by threads. Using PID 5965 as a reference, the thread with the highest usage corresponds to the application gst-launch-1.0. In summary, the value in the table above represents the total CPU% usage of the gst-launch-1.0 application. Note that this table also shows the overhead generated by Nsight Systems; this percentage can be ignored when calculating the actual usage of the application.

Top threads by CPU%:
pid   tid   cpu_util_percent  thread_name
5965  5965  29.76             gst-launch-1.0
5965  5992  14.88             qtdemux0:sink
5965  5998  13.69             qtdemux0:sink
5965  5966  5.95              [NSys]
5965  5997  5.95              qtdemux0:sink
5969  5969  4.17              gst-plugin-scan
5931  5931  2.38              jq
5935  5935  2.38              jq
5939  5939  2.38              jq
5943  5943  2.38              jq
=================================================

Finally, GPU% is extracted from the summary section, which shows the total runtime of the application, the GPU active time, and the resulting GPU%.

===== Nsight SQLite Performance Summary =====
Wall time (s):        1.439531
CPU% source:         cycles-based (see tables above)
GPU active (s):       0.045897
GPU% (approx):       3.19
  (GPU% ≈ (kernel + memset time) / wall; overlapping kernels may over-count)


Previous: Tools/_CUDA_Profiler Index Next: Tools/ Computational Budget Tool