1,433
edits
Line 542: | Line 542: | ||
In this section you will find a comparison of JPEG video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatijpegdec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_mjpeg element). The test pipelines only differ in JPEG decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation. | In this section you will find a comparison of JPEG video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatijpegdec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_mjpeg element). The test pipelines only differ in JPEG decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation. | ||
=== <span style="color:#0931C6">CPU load % per core</span><br> === | |||
'''''Test pipeline (ducatijpegdec):''''' | |||
<pre style="background:#d6e4f1"> | |||
GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e | |||
</pre> | |||
'''''Test pipeline (avdec_mjpeg):''''' | |||
<pre style="background:#d6e4f1"> | |||
GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e | |||
</pre> | |||
'''''Obtained Results:''''' | |||
[[Image:AM572x-testbench-JPEG-dec-cpuload.png|center|700px|AM572x-testbench-JPEG-dec-cpuload.png]]<br> | |||
In the chart above, is clearly shown that when using hardware acceleration, a big reduction in CPU workload is achieved. The average difference between CPU_1_accel and CPU_1_unaccel is 42.8% less load for CPU_1_accel. In both cases the corresponding CPU_0 core is practically off, and there is no difference between them. | |||
=== <span style="color:#0931C6">Frame-rate</span><br> === | |||
'''''Test pipeline (ducatijpegdec):''''' | |||
<pre style="background:#d6e4f1"> | |||
GST_TRACER_PLUGINS="framerate" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e | |||
</pre> | |||
'''''Test pipeline (avdec_mjpeg):''''' | |||
<pre style="background:#d6e4f1"> | |||
GST_TRACER_PLUGINS="framerate" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e | |||
</pre> | |||
'''''Obtained Results:''''' | |||
[[Image:AM572x-testbench-JPEG-dec-framerate.png|center|700px|AM572x-testbench-JPEG-dec-framerate.png]]<br> | |||
In the chart above, it can be seen in a general way that in both cases, the frame-rate reaches the expected value of 25 fps and then remains stable. | |||
=== <span style="color:#0931C6">Memory consumption</span><br> === | |||
'''''Test pipeline (ducatijpegdec):''''' | |||
<pre style="background:#d6e4f1"> | |||
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e | |||
</pre> | |||
'''''Test pipeline (avdec_mjpeg):''''' | |||
<pre style="background:#d6e4f1"> | |||
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e | |||
</pre> | |||
'''''Obtained Results:''''' | |||
[[Image:AM572x-testbench-JPEG-dec-memuse.png|center|700px|AM572x-testbench-JPEG-dec-memuse.png]]<br> | |||
In the chart above, it can be seen that when using hardware acceleration, a reduction is achieved in memory consumption. The average difference is 1304 KB of less consumption when hardware acceleration is used. | |||
=== <span style="color:#0931C6">Memory bandwidth consumption</span><br> === | |||
'''''Test pipeline (ducatijpegdec):''''' | |||
<pre style="background:#d6e4f1"> | |||
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/Wreck-It_Ralph_MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! ducatijpegdec ! fakesink sync=true -e | |||
</pre> | |||
'''''Test pipeline (avdec_mjpeg):''''' | |||
<pre style="background:#d6e4f1"> | |||
gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/Wreck-It_Ralph_MJPEG.mov ! qtdemux name=demux demux.video_0 ! queue ! jpegparse ! avdec_mjpeg ! fakesink sync=true -e | |||
</pre> | |||
Note: In both charts the memory bandwidth consumption is presented separately in sequential (seq) and aleatory (al) memory access. | |||
'''''Memory bandwidth consumption by memory readings obtained results:''''' | |||
[[Image:AM572x-testbench-JPEG-dec-readsbandwidth.png|center|700px|AM572x-testbench-JPEG-dec-readbandwidth.png]]<br> | |||
In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by memory readings is obtained. The average difference is 448.9 MB/s for sequential reads and 175.5 MB/s for aleatory reads. | |||
'''''Memory bandwidth consumption by memory writings obtained results:''''' | |||
[[Image:AM572x-testbench-JPEG-dec-writebandwidth.png|center|700px|AM572x-testbench-JPEG-dec-writebandwidth.png]]<br> | |||
In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by sequential memory writings is obtained, and a little more memory bandwidth is consumed by aleatory writes. The average difference is 1046.9 MB/s for sequential writes and 155.1 MB/s for aleatory writes. | |||
== <span style="color:#008080">Resolution scale and color-space conversion</span><br> == | |||
In this section you will find a comparison of resolution scale and color-space conversion GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-vpe (vpe element), and on the other side, the only software implementation uses the videoscale and videoconvert elements. The test pipelines only differ in resolution scale and color-space conversion GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation. | |||
edits