AM5728 Multimedia Performance Testbench: Difference between revisions

AM5728 Multimedia Performance Testbench (view source)

Revision as of 15:50, 8 June 2016

5,414 bytes added , 8 June 2016

→‎JPEG video decode

Dgarbanzo

1,433

edits

@@ Line 542: / Line 542: @@
 In this section you will find a comparison of JPEG video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatijpegdec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_mjpeg element). The test pipelines only differ in JPEG decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation.
+=== <span style="color:#0931C6">CPU load % per core</span><br>  ===
+'''''Test pipeline (ducatijpegdec):'''''
+<pre style="background:#d6e4f1">
+GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e
+</pre>
+'''''Test pipeline (avdec_mjpeg):'''''
+<pre style="background:#d6e4f1">
+GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e
+</pre>
+'''''Obtained Results:'''''
+[[Image:AM572x-testbench-JPEG-dec-cpuload.png|center|700px|AM572x-testbench-JPEG-dec-cpuload.png]]<br>
+In the chart above, is clearly shown that when using hardware acceleration, a big reduction in CPU workload is achieved. The average difference between CPU_1_accel and CPU_1_unaccel is 42.8% less load for CPU_1_accel. In both cases the corresponding  CPU_0 core is practically off, and there is no difference between them.
+=== <span style="color:#0931C6">Frame-rate</span><br>  ===
+'''''Test pipeline (ducatijpegdec):'''''
+<pre style="background:#d6e4f1">
+GST_TRACER_PLUGINS="framerate" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e
+</pre>
+'''''Test pipeline (avdec_mjpeg):'''''
+<pre style="background:#d6e4f1">
+GST_TRACER_PLUGINS="framerate" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e
+</pre>
+'''''Obtained Results:'''''
+[[Image:AM572x-testbench-JPEG-dec-framerate.png|center|700px|AM572x-testbench-JPEG-dec-framerate.png]]<br>
+In the chart above, it can be seen in a general way that in both cases, the frame-rate reaches the expected value of 25 fps and then remains stable.
+=== <span style="color:#0931C6">Memory consumption</span><br>  ===
+'''''Test pipeline (ducatijpegdec):'''''
+<pre style="background:#d6e4f1">
+gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e
+</pre>
+'''''Test pipeline (avdec_mjpeg):'''''
+<pre style="background:#d6e4f1">
+gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e
+</pre>
+'''''Obtained Results:'''''
+[[Image:AM572x-testbench-JPEG-dec-memuse.png|center|700px|AM572x-testbench-JPEG-dec-memuse.png]]<br>
+In the chart above, it can be seen that when using hardware acceleration, a reduction is achieved in memory consumption. The average difference is 1304 KB of less consumption when hardware acceleration is used.
+=== <span style="color:#0931C6">Memory bandwidth consumption</span><br>  ===
+'''''Test pipeline (ducatijpegdec):'''''
+<pre style="background:#d6e4f1">
+gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/Wreck-It_Ralph_MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e
+</pre>
+'''''Test pipeline (avdec_mjpeg):'''''
+<pre style="background:#d6e4f1">
+gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/Wreck-It_Ralph_MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e
+</pre>
+Note: In both charts the memory bandwidth consumption is presented separately in sequential (seq) and aleatory (al) memory access.
+'''''Memory bandwidth consumption by memory readings obtained results:'''''
+[[Image:AM572x-testbench-JPEG-dec-readsbandwidth.png|center|700px|AM572x-testbench-JPEG-dec-readbandwidth.png]]<br>
+In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by memory readings is obtained. The average difference is 448.9 MB/s for sequential reads and 175.5 MB/s for aleatory reads.
+'''''Memory bandwidth consumption by memory writings obtained results:'''''
+[[Image:AM572x-testbench-JPEG-dec-writebandwidth.png|center|700px|AM572x-testbench-JPEG-dec-writebandwidth.png]]<br>
+In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by sequential memory writings is obtained, and a little more memory bandwidth is consumed by aleatory writes. The average difference is 1046.9 MB/s for sequential writes and 155.1 MB/s for aleatory writes.
+== <span style="color:#008080">Resolution scale and color-space conversion</span><br>  ==
+In this section you will find a comparison of resolution scale and color-space conversion GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-vpe (vpe element), and on the other side, the only software implementation uses the videoscale and videoconvert elements. The test pipelines only differ in resolution scale and color-space conversion GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation.