1,433
edits
Line 234: | Line 234: | ||
In the chart above, is clearly shown that when using hardware acceleration, a substantial reduction in CPU workload is achieved. The average difference between CPU_1_accel and CPU_1_unaccel is 48.8 % less load for CPU_1_accel. In both cases the CPU_0 has the same average workload percentage, so there is no difference between them. | In the chart above, is clearly shown that when using hardware acceleration, a substantial reduction in CPU workload is achieved. The average difference between CPU_1_accel and CPU_1_unaccel is 48.8 % less load for CPU_1_accel. In both cases the CPU_0 has the same average workload percentage, so there is no difference between them. | ||
=== <span style="color:#0931C6">Frame-rate</span><br> === | |||
'''''Test pipeline (ducatimpeg4enc):''''' | |||
<pre style="background:#d6e4f1"> | |||
GST_TRACER_PLUGINS="framerate" gst-launch-1.0 -e videotestsrc num-buffers=640 is-live=true ! 'video/x-raw,format=(string)NV12,width=720,height=420,framerate=(fraction)30/1' ! ducatimpeg4enc ! fakesink sync=true | |||
</pre> | |||
'''''Test pipeline (avenc_mpeg4):''''' | |||
<pre style="background:#d6e4f1"> | |||
GST_TRACER_PLUGINS="framerate" gst-launch-1.0 -e videotestsrc num-buffers=640 is-live=true ! 'video/x-raw,format=(string)I420,width=720,height=420,framerate=(fraction)30/1' ! avenc_mpeg4 ! fakesink sync=true | |||
</pre> | |||
'''''Obtained Results:''''' | |||
[[Image:AM572x-testbench-MPEG4-enc-framerate.png|center|700px|AM572x-testbench-MPEG4-enc-framerate.png]]<br> | |||
In the chart above, it can be seen in a general way that in both cases, the frame-rate reaches the expected value of 30 fps and then remains stable. | |||
=== <span style="color:#0931C6">Memory consumption</span><br> === | |||
'''''Test pipeline (ducatimpeg4enc):''''' | |||
<pre style="background:#d6e4f1"> | |||
gst-launch-1.0 -e videotestsrc num-buffers=640 is-live=true ! 'video/x-raw,format=(string)NV12,width=720,height=420,framerate=(fraction)30/1' ! ducatimpeg4enc ! fakesink sync=true | |||
</pre> | |||
'''''Test pipeline (avenc_mpeg4):''''' | |||
<pre style="background:#d6e4f1"> | |||
gst-launch-1.0 -e videotestsrc num-buffers=640 is-live=true ! 'video/x-raw,format=(string)I420,width=720,height=420,framerate=(fraction)30/1' ! avenc_mpeg4 ! fakesink sync=true | |||
</pre> | |||
'''''Obtained Results:''''' | |||
[[Image:AM572x-testbench-MPEG4-enc-memuse.png|center|700px|AM572x-testbench-MPEG4-enc-memuse.png]]<br> | |||
In the chart above, it can be seen that when using hardware acceleration, a big reduction is achieved in memory consumption. The average difference is 4514 KB of less consumption when hardware acceleration is used. | |||
=== <span style="color:#0931C6">Memory bandwidth consumption</span><br> === | |||
'''''Test pipeline (ducatimpeg4enc):''''' | |||
<pre style="background:#d6e4f1"> | |||
gst-launch-1.0 -e videotestsrc is-live=true ! 'video/x-raw,format=(string)NV12,width=720,height=420,framerate=(fraction)30/1' ! ducatimpeg4enc ! fakesink sync=true | |||
</pre> | |||
'''''Test pipeline (avenc_mpeg4):''''' | |||
<pre style="background:#d6e4f1"> | |||
gst-launch-1.0 -e videotestsrc is-live=true ! 'video/x-raw,format=(string)I420,width=720,height=420,framerate=(fraction)30/1' ! avenc_mpeg4 ! fakesink sync=true | |||
</pre> | |||
Note: In both charts the memory bandwidth consumption is presented separately in sequential (seq) and aleatory (al) memory access. | |||
'''''Memory bandwidth consumption by memory readings obtained results:''''' | |||
[[Image:AM572x-testbench-MPEG4-enc-readsbandwidth.png|center|700px|AM572x-testbench-MPEG4-enc-readbandwidth.png]]<br> | |||
In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by memory readings is obtained. The average difference is 358.1 MB/s for sequential reads and 446.9 MB/s for aleatory reads. | |||
'''''Memory bandwidth consumption by memory writings obtained results:''''' | |||
[[Image:AM572x-testbench-MPEG4-enc-writebandwidth.png|center|700px|AM572x-testbench-MPEG4-enc-writebandwidth.png]]<br> | |||
In the chart above, it can be seen that when using hardware acceleration, less memory bandwidth consumption by memory writings is obtained. The average difference is 1832.7 MB/s for sequential writes and 499.9 MB/s for aleatory writes. | |||
== <span style="color:#008080">H264 video decode</span><br> == | |||
In this section you will find a comparison of H264 video decode GStreamer pipelines performance results between hardware accelerated and only software implementation. The hardware accelerated implementation uses gst-plugins-ducati (ducatih264dec element), and on the other side, the only software implementation uses the gst-plugins-libav (avdec_h264 element). The test pipelines only differ in H264 decode GStreamer element, using in one case the hardware accelerated, and in the other case using the non hardware accelerated implementation. | |||
=== <span style="color:#0931C6">CPU load % per core</span><br> === | |||
'''''Test pipeline (ducatih264dec):''''' | |||
<pre style="background:#d6e4f1"> | |||
GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-H264.mov ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! ducatih264dec ! fakesink sync=true -e | |||
</pre> | |||
'''''Test pipeline (avdec_h264):''''' | |||
<pre style="background:#d6e4f1"> | |||
GST_TRACER_PLUGINS="cpuusage" gst-launch-1.0 filesrc location=/am5728-gst-tests/video-samples/TearOfSteel-Short-1920x800-H264.mov ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! avdec_h264 ! fakesink sync=true -e | |||
</pre> | |||
'''''Obtained Results:''''' | |||
[[Image:AM572x-testbench-MPEG4-enc-cpuload.png|center|700px|AM572x-testbench-MPEG4-enc-cpuload.png]]<br> | |||
In the chart above, is clearly shown that when using hardware acceleration, a substantial reduction in CPU workload is achieved. The average difference between CPU_0_accel and CPU_0_unaccel is 49.2% less load for CPU_0_accel. The average difference between CPU_1_accel and CPU_1_unaccel is 39% less load for CPU_1_accel. | |||
edits