NVIDIA Nsight Systems Performance Tool Wrapper
| RidgeRun CUDA Optimisation Guide | |||||
|---|---|---|---|---|---|
| GPU Architecture | |||||
|
|
|||||
| Optimisation Workflow | |||||
|
|||||
| Optimisation Recipes | |||||
|
|||||
| Common pitfalls when optimising | |||||
|
|
|||||
| Examples | |||||
|
|
|||||
| Empirical Experiments | |||||
|
|
|||||
| Contact Us |
Nsight Ssytems Profiler
NVIDIA Nsight Systemsn tool is able to profile the usage of GPU and CPU. As example on how the user can analyze this data. The user needs to have an application using CUDA memory in some way such as using NVIDIA's GStreamer plugins and the following command.
sudo nsys profile -t osrt,cuda,nvtx -s process-tree --cpuctxsw=process-tree --gpu-metrics-devices all --cuda-memory-usage true --stats true -o report <APPLICATION>
The output generated will be a .report and .sqlite file. The latter one is used for extracting the CPU and GPU usage.
Extracting the GPU and CPU usage
In the following repository you can find a script capable of extracting this values.
1.: Clone the repository
git clone https://gitlab.ridgerun.com/open/performance_wrapper.git
2.: Run the nsys_sqlite_perf.sh with the location of the sqlite file previously generated.
./nsys_sqlite_perf.sh /path/to/out_report.sqlite --top 10
Analyzing the output
The output looks like the following:
The total CPU% usage is based on the CPU cycles. In the example below, the process with PID 5965 uses approximately 76.19%.
===== CPU (from COMPOSITE_EVENTS.cpuCycles) ===== Top PIDs by CPU%: pid cpu_util_percent 5965 76.19 5969 4.17 5931 2.38 5935 2.38 5939 2.38 5943 2.38 5975 2.38 5981 2.38 5949 1.79 5955 1.79
Next, the output shows the CPU% usage by threads. Using PID 5965 as a reference, the thread with the highest usage corresponds to the application gst-launch-1.0. In summary, the value in the table above represents the total CPU% usage of the gst-launch-1.0 application. Note that this table also shows the overhead generated by Nsight Systems; this percentage can be ignored when calculating the actual usage of the application.
Top threads by CPU%: pid tid cpu_util_percent thread_name 5965 5965 29.76 gst-launch-1.0 5965 5992 14.88 qtdemux0:sink 5965 5998 13.69 qtdemux0:sink 5965 5966 5.95 [NSys] 5965 5997 5.95 qtdemux0:sink 5969 5969 4.17 gst-plugin-scan 5931 5931 2.38 jq 5935 5935 2.38 jq 5939 5939 2.38 jq 5943 5943 2.38 jq =================================================
Finally, GPU% is extracted from the summary section, which shows the total runtime of the application, the GPU active time, and the resulting GPU%.
===== Nsight SQLite Performance Summary ===== Wall time (s): 1.439531 CPU% source: cycles-based (see tables above) GPU active (s): 0.045897 GPU% (approx): 3.19 (GPU% ≈ (kernel + memset time) / wall; overlapping kernels may over-count)