AM5728 Multimedia Performance Testbench: Difference between revisions

Line 542: Line 542:


In this section you will find a comparison of JPEG video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatijpegdec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_mjpeg element). The test pipelines only differ in JPEG decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation.
In this section you will find a comparison of JPEG video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatijpegdec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_mjpeg element). The test pipelines only differ in JPEG decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation.
=== <span style="color:#0931C6">CPU load % per core</span><br>  ===
'''''Test pipeline (ducatijpegdec):'''''
<pre style="background:#d6e4f1">
GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e
</pre>
'''''Test pipeline (avdec_mjpeg):'''''
<pre style="background:#d6e4f1">
GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e
</pre>
'''''Obtained Results:'''''
[[Image:AM572x-testbench-JPEG-dec-cpuload.png|center|700px|AM572x-testbench-JPEG-dec-cpuload.png]]<br>
In the chart above, is clearly shown that when using hardware acceleration, a big reduction in CPU workload is achieved. The average difference between CPU_1_accel and CPU_1_unaccel is 42.8% less load for CPU_1_accel. In both cases the corresponding  CPU_0 core is practically off, and there is no difference between them.
=== <span style="color:#0931C6">Frame-rate</span><br>  ===
'''''Test pipeline (ducatijpegdec):'''''
<pre style="background:#d6e4f1">
GST_TRACER_PLUGINS="framerate" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e
</pre>
'''''Test pipeline (avdec_mjpeg):'''''
<pre style="background:#d6e4f1">
GST_TRACER_PLUGINS="framerate" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e
</pre>
'''''Obtained Results:'''''
[[Image:AM572x-testbench-JPEG-dec-framerate.png|center|700px|AM572x-testbench-JPEG-dec-framerate.png]]<br>
In the chart above, it can be seen in a general way that in both cases, the frame-rate reaches the expected value of 25 fps and then remains stable.
=== <span style="color:#0931C6">Memory consumption</span><br>  ===
'''''Test pipeline (ducatijpegdec):'''''
<pre style="background:#d6e4f1">
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e
</pre>
'''''Test pipeline (avdec_mjpeg):'''''
<pre style="background:#d6e4f1">
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e
</pre>
'''''Obtained Results:'''''
[[Image:AM572x-testbench-JPEG-dec-memuse.png|center|700px|AM572x-testbench-JPEG-dec-memuse.png]]<br>
In the chart above, it can be seen that when using hardware acceleration, a reduction is achieved in memory consumption. The average difference is 1304 KB of less consumption when hardware acceleration is used.
=== <span style="color:#0931C6">Memory bandwidth consumption</span><br>  ===
'''''Test pipeline (ducatijpegdec):'''''
<pre style="background:#d6e4f1">
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/Wreck-It_Ralph_MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e
</pre>
'''''Test pipeline (avdec_mjpeg):'''''
<pre style="background:#d6e4f1">
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/Wreck-It_Ralph_MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e
</pre>
Note: In both charts the memory bandwidth consumption is presented separately in sequential (seq) and aleatory (al) memory access.
'''''Memory bandwidth consumption by memory readings obtained results:'''''
[[Image:AM572x-testbench-JPEG-dec-readsbandwidth.png|center|700px|AM572x-testbench-JPEG-dec-readbandwidth.png]]<br>
In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by memory readings is obtained. The average difference is 448.9 MB/s for sequential reads and 175.5 MB/s for aleatory reads.
'''''Memory bandwidth consumption by memory writings obtained results:'''''
[[Image:AM572x-testbench-JPEG-dec-writebandwidth.png|center|700px|AM572x-testbench-JPEG-dec-writebandwidth.png]]<br>
In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by sequential memory writings is obtained, and a little more memory bandwidth is consumed by aleatory writes. The average difference is 1046.9 MB/s for sequential writes and 155.1 MB/s for aleatory writes.
== <span style="color:#008080">Resolution scale and color-space conversion</span><br>  ==
In this section you will find a comparison of resolution scale and color-space conversion GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-vpe (vpe element), and on the other side, the only software implementation uses the videoscale and videoconvert elements. The test pipelines only differ in resolution scale and color-space conversion GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation.




1,433

edits