AM5728 Multimedia Performance Testbench: Difference between revisions

Line 296: Line 296:


In this section you will find a comparison of H264 video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatih264dec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_h264 element). The test pipelines only differ in H264 decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation.
In this section you will find a comparison of H264 video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatih264dec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_h264 element). The test pipelines only differ in H264 decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation.
=== <span style="color:#0931C6">CPU load % per core</span><br>  ===
'''''Test pipeline (ducatih264dec):'''''
<pre style="background:#d6e4f1">
GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-H264.mov ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! ducatih264dec ! fakesink sync=true -e
</pre>
'''''Test pipeline (avdec_h264):'''''
<pre style="background:#d6e4f1">
GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-H264.mov ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! avdec_h264 ! fakesink sync=true -e
</pre>
'''''Obtained Results:'''''
[[Image:AM572x-testbench-H264-dec-cpuload.png|center|700px|AM572x-testbench-H264-dec-cpuload.png]]<br>
In the chart above, is clearly shown that when using hardware acceleration, a substantial reduction in CPU workload is achieved. The average difference between CPU_0_accel and CPU_0_unaccel is 49.2% less load for CPU_0_accel. The average difference between CPU_1_accel and CPU_1_unaccel is 39% less load for CPU_1_accel.
=== <span style="color:#0931C6">Frame-rate</span><br>  ===
'''''Test pipeline (ducatih264dec):'''''
<pre style="background:#d6e4f1">
GST_TRACER_PLUGINS="framerate" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-H264.mov ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! ducatih264dec ! fakesink sync=true -e
</pre>
'''''Test pipeline (avdec_h264):'''''
<pre style="background:#d6e4f1">
GST_TRACER_PLUGINS="framerate" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-H264.mov ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! avdec_h264 ! fakesink sync=true -e
</pre>
'''''Obtained Results:'''''
[[Image:AM572x-testbench-H264-dec-framerate.png|center|700px|AM572x-testbench-H264-dec-framerate.png]]<br>
In the chart above, it can be seen in a general way that in both cases, the frame-rate reaches the expected value of 24 fps and then remains stable.
=== <span style="color:#0931C6">Memory consumption</span><br>  ===
'''''Test pipeline (ducatih264dec):'''''
<pre style="background:#d6e4f1">
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-H264.mov ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! ducatih264dec ! fakesink sync=true -e
</pre>
'''''Test pipeline (avdec_h264):'''''
<pre style="background:#d6e4f1">
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-H264.mov ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! avdec_h264 ! fakesink sync=true -e
</pre>
'''''Obtained Results:'''''
[[Image:AM572x-testbench-H264-dec-memuse.png|center|700px|AM572x-testbench-H264-dec-memuse.png]]<br>
In the chart above, it can be seen that when using hardware acceleration, an enormous reduction is achieved in memory consumption. The average difference is 10 869 KB of less consumption when hardware acceleration is used.
=== <span style="color:#0931C6">Memory bandwidth consumption</span><br>  ===
'''''Test pipeline (ducatih264dec):'''''
<pre style="background:#d6e4f1">
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/Wreck-It_Ralph_H264.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! ducatih264dec ! fakesink sync=true -e
</pre>
'''''Test pipeline (avdec_h264):'''''
<pre style="background:#d6e4f1">
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/Wreck-It_Ralph_H264.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! avdec_h264 ! fakesink sync=true -e
</pre>
Note: In both charts the memory bandwidth consumption is presented separately in sequential (seq) and aleatory (al) memory access.
'''''Memory bandwidth consumption by memory readings obtained results:'''''
[[Image:AM572x-testbench-H264-dec-readsbandwidth.png|center|700px|AM572x-testbench-H264-dec-readbandwidth.png]]<br>
In the chart above, it can be seen that when using hardware acceleration, more memory bandwidth consumption by memory readings is obtained. The average difference is 328.6 MB/s for sequential reads and 44.4 MB/s for aleatory reads.
'''''Memory bandwidth consumption by memory writings obtained results:'''''
[[Image:AM572x-testbench-H264-dec-writebandwidth.png|center|700px|AM572x-testbench-H264-dec-writebandwidth.png]]<br>
In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by memory writings is obtained. The average difference is 7.9 MB/s for sequential writes and 45 MB/s for aleatory writes. Only a little optimization is achieved.
== <span style="color:#008080">MPEG4 video decode</span><br>  ==
In this section you will find a comparison of MPEG4 video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatimpeg4dec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_mpeg4 element). The test pipelines only differ in MPEG4 decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation.




1,433

edits