Gst-ducati and libdce in DM816x

From RidgeRun Developer Connection
Jump to: navigation, search

Introduction

Options for running a ducati based streaming A/V system on the DM8168 are analyzed, including the role of gst-ducati and libdce in such a system. The key challenge is accommodating the architecture differences between OMAP4 and friends, which is the target for libdce, and DM8168. To understand the differences, the various software layers used on each one needs to be examined for omappedia, which is the bases for the Rob Clark libdce and gst-ducati modules. Instructions for building libdce and gst-ducati including the repositories that could be used during the process are given as well as a plan to move the design from the standard OMX/gst-omx API to access the A/V hardware accelerators to a libdce/gst-ducati based system.

Ducati architecture

In 2010, Texas Instruments created the Ducati architecture and it has been included with modifications on several of their SoCs like omap4430 (panda board), omap5, dm8168, dm8148, dm385 and lately on am5728. However, not all of them include the same units nor the same ARM versions. A general diagram of the architecture described in omappedia is shown in figure 1.

DucatiBlockDiagram2.jpg
Figure 1. TI Ducati architecture

It includes a dual Cortex-M3 Sub System (M4 in case omap5 and AM5728), Ducati Imaging Sub System (ISS) and a Ducati Image Video Accelerator which is also called High Definition (IVA-HD) Sub System. The GPP ARM core (running linux normally) sends instructions to the M3s cores coprocessors and these are the ones that parses the instructions to finally control the hardware accelerators called ISS and IVA-HD. The ISS performs the image analysis and processing functions like MIPI capture, display, color space conversions, deinterlacing, scale (for omap it also has a JPEG encoder on it) and the IVA-HD(HDVICP2) performs the video processing functions like encoding and decoding.

Ducati Software

TI is not using the same hardware architecture in all the SoCs, for instance, DM8168 doesn't contain the ISS accelerator and therefore it cannot capture from a MIPI camera nor uses the Display Subsystem (DSS) included in OMAP4. Instead, the DM8168 contains a different hardware accelerator called VPSS which handles the capture and display and the image processing functions. This difference in architecture reduces the ability to use the same ducati software in all SoCs because some modules might be different and might need extra work. However, almost all them have in common the IVAHD accelerator so the codecs might be used between them as well as the software to get access to them.

The 2xM3 Cortex (named sys and app in the above diagram) run an RTOS like TI's BIOS. The SyS M3 also runs the notify driver which uses syslink to get messages from ARM and it passes the messages to the app M3, the app M3 can send messages back to the ARM. The app M3 runs the codecs and therefore controls the IVA-HD accelerator.

TI releases different set of codecs depending of the SoC but the developer could try to integrate them. Some of the codecs that might be integrated are:

         -Encoders: H264, MPEG4, H263, MJPEG
         -Decoders: MPEG2, MJPEG, H264

TI wrapped ducati with OMX, allowing user space applications running on the GPP ARM to get access to the hardware accelerated functions like capture, display, encoding and decoding. This means TI recommended using their openMax library on the ARM side to create instances of encoders, decoders, cameras and underneath it sends messages using syslink to the coprocessors which also run codec engine to create an instance of the codecs, pass arguments, process buffers, etc. This is the current approach used in DM8168. OMX also has its own memory heap and therefore its own memory map.

In November 2010 Rob Clark started a project to remove the OMX layer and instead provide an interface to codec engine directly from the ARM, this shim layer is called Distributed Codec Engine or libdce. Libdce allows to send the message from the ARM to the ducati architecture to finally create a codec instance. By using the Rob Clark approach, the buggy TI openMax library is not used.

When reading about ducati you will find documentation about two different implementations, one using OMX (latest change 3 years ago) and other one using libdce, the latter is the one that we are interested on this article.

Distributed Codec Engine

Also called DCE or libdce, Distributed Codec Engine is the library to access the hardware acceleratrors ISS and IVAHD via syslink/RCM without openMax. Remote Command Message (RCM) is a TI SW framework that provides a client/server implementation for executing functions on a remote processor (it was also used by OMX).

Tested in OMAP4, OMAP5 and AM5728.

The library depends on the following components

  • M3 Side:
    • M3 compiler or code generation tools - available for DM8168.
    • XDC tools - available for DM8168.
    • BIOS - available for DM8168.
    • Codec Engine - available for DM8168.
    • Framework components - available for DM8168.
    • XDAIS - available for DM8168.
    • HDVICP codecs - available for DM8168.
    • Syslink - available for DM8168.
    • DCE - Not tested on DM8168 yet.
  • ARM Side:
    • kernel newer 2.6.38, some places say that newer than 3.0 due the rpmsg module - not available for DM8168
    • Tiler package - Not tested on DM8168 yet.
    • Syslink 2.0 and not 3.0 - DM8168 uses 2.21 so it should be okay.
    • DCE User Space.

For the above components Omappedia offers some instructions to compile them

In case of DCE, Rob Clark has a deprecated repository where the last commit was 6 years ago:

https://github.com/robclark/libdce

Then it indicates that the code was moved to a new repository where the last commit was 4 years ago:

git://gitorious.org/gstreamer-omap/libdce.git
https://gitorious.org/gstreamer-omap/libdce?p=gstreamer-omap:libdce.git;a=tree

Most recently (2015) TI launched his AM5728 which is a simplified version of DM8168 and it is more similar to the OMAPs. This AM5728 uses DCE and has a newer repository (3 years ago):

https://git.ti.com/glsdk/dce

This is the one that should be used for DM8168. It is important to mention that the GLSDK is like EZSDK for the automotive OMAP and it uses gst-ducati and libdce.

GStreamer and gst-ducati

Back in 2010 Rob Clark also started creating gst-ducati which is a set of GStreamer elements using libdce underneath. Since it was mainly tested on OMAP4, OMAP5 and AM5728 it is not clear how the capture and display tasks would be handled in DM8168 since it doesn't have the ISS accelerator but a VPSS

His original repository was for GStreamer 0.10 and the last update was 4 years ago:

old: https://gitorious.org/gstreamer-omap/gst-ducati (4 years old)

for codecs:
  * ''$HOME/ducati/ivahd_hdvicp20api_01_00_00_19_production''
  * ''$HOME/ducati/ivahd_h264dec_01_00_00_00_production''
  * ''$HOME/ducati/ivahd_mpeg2vdec_01_00_00_00_production''
  * ''$HOME/ducati/ivahd_jpegvdec_01_00_00_00_production''

DM816x uses newer and more codecs.

There is a new version for gstreamer 1.2 given with AM5728 and where the latest commit was made in December 2016

https://git.ti.com/glsdk/gst-plugin-ducati
https://git.ti.com/glsdk/gst-plugin-ducati/blobs/master/README

gstreamer elements supported

  • Decoders
h264dec
jpegdec
mpeg2dec
mpeg4dec
vc1dec
  • Encoders
h264enc

Conclusions

  1. DM816x and OMAP4, OMAP5 and AM5728 Architecture are not the same, ISS or DSS instead of VPSS.
  2. DM8168 kernel doesn't have the rpmsg syslink module
  3. Repositories from omappedia and RobClark are too old (6 or 4 years old) but code available for GLSDK (AM5728) is newer and uses the same stack.
  4. There are two ways to get ducati working the one used by Rob Clark doesn't include OMX but distributed codec engine (libdce), it is for omap4, omap5 and AM5728 mainly.
  5. libdce and gst-ducati require a new kernel 3.0+
  6. gst-ducati doesn't have video sink support for a display. It would need a VPSS driver or a DRM kernel driver that would use the VPSS underneath.
  7. No audio support in libdce nor gst-ducati.

Main Questions when porting to DM8168

Steps to validate gst-ducati/libdce on DM8168

The OMAP architecture is a little bit different because DM8168 doesn't have an ISS accelerator to do JPEG encoding or to capture MIPI CSI. Instead the DM8168 has a hardware accelerator called VPSS which handles deinterlacing, color space conversions, capture from a parallel port and display. This means that libdce and gst-ducati could be compiled and likely run on DM8168 but all the VPSS features would be unusable and a good amount of work would be needed to enable it.

Some tasks involved to see if gst-ducati doesn't have the problems described by OMX are:

Initial port to boot kernel and ping M3 firmware

  • Update kernel to 3.x so the kernel module for rpmsg could be used and check that it uses syslink notify underneath.
  • Create own version of EZSDK based on the one that RidgeRun has and remove OMX to then add libdce.
  • Compile custom EZSDK to generate .xem3 firmwares to be run on M3 cores - Using EZSDK Overlay
  • Adjust memory map so M3s and ARM don't have conflicts when using libdce or gst-ducati.
  • Try loading new ducati firmware into coprocessors through kernel load firmware support.
  • Compile gst-ducati and dependencies (gst core libraries) and try to run examples related to the encoders/decoders only.
  • Try Audio Capture and playback support on new kernel

VPSS Capture, Deiscaler, Scale and Display Support

  • Create kernel drivers for the VPSS that would use rpmsg/syslink underneath
  • Create custom V4L2 capture driver that would parse V4L2 structures (shim layer) and make calls to VPSS kernel driver created before.
  • Create subdevice driver for GS2971 and example C application to verify that system can capture frames.
  • Create custom V4L2 display driver or adjust a DRM driver to make it work with VPSS
  • Add support to gstreamer elements v4l2sink and v4l2src to work with new V4L2 layer and kernel.
  • Create gst-ducati elements to deinterlace, downscale and do color space conversions.

Audio Support

  • Integrate audio codecs to M3 firmwares that support libdce
  • Create gst-ducati elements for audio: aacencoder, aac decoder

Gstreamer

  • Create gstreamer pipelines to verify that we can record and play files needed
  • Modify gstreamer application to make it work with gstreamer 1.0 and with new elements.

Has someone tried ducati in DM81xx

At the moment of writing this article it was not possible to find a person successfully running gst-ducati and libdce in DM8168. Actually nobody has posted a question or article about trying it. The only questions and information related are for OMAP4, OMAP5, Jacinto 6 and AM5728

https://e2e.ti.com/support/omap/f/849/t/192474
https://e2e.ti.com/support/omap/f/849/p/249986/875729#875729
https://e2e.ti.com/support/embedded/multimedia_software_codecs/f/356/t/420126
https://developer.ridgerun.com/wiki/index.php?title=Gstreamer_pipelines_for_AM572x

We suppose that the main reason is the lack of a newer kernel and the work involved to get the VPSS (capture and display) up and running.

What is supported in gst-ducati?

gst-ducati is intended to support an architecture that normally includes and ISS ASIC like the OMAPs, according to the repository it supports the following elements:

*Decoders
h264dec
jpegdec
mpeg2dec
mpeg4dec
vc1dec

*Encoders
h264enc

There is not support for capture and display

Which kernel does it need?

libdce requires the rpmsg module available in new kernels to communicate with the coprocessors. For this reason a kernel newer than 3.0 is recommended.

Can libdce be used with kernel 2.6.37?

If someone is interested on testing it on kernel 2.6.37 it would need to port libdce to make it work in that kernel (doesn't sound like a good idea) because it was created to use it with rpmsg

Which IPC SW does it use?

Kernel support for rpmsg, syslink and RCM according to the links found. Syslink and RCM are used in DM8168 by default.

Does it use OMX?

There are two implementations, the one suggested in omappedia uses OMX as is described in ducati for dummies but the gst-ducati created by Rob Clark is based in libdce and doesn't use OMX.

Can OMX and gst-ducati be used at the same time?

Each framework would need its own memory map. Furthermore each framework was created for a different kernel version, OMX for 2.6.37 and libdce/gst-ducati for +3.0, even if you are able to port one of the other it is not clear if it is going to be a resource usage problem when having both OMX and gst-ducati trying to use the IVAHD. Furthermore, the firmware running on the M3s would need to be modified to support both frameworks too.

What gst version does it use?

latest gst-ducati available in GLSDK uses gstreamer 1.2 but RidgeRun has actually been able to use gst-ducati on AM5728 with gstreamer 1.8

What is the difference between omap4 and DM816x?

The DM8168 has 3x ASICs IVA-HD while the OMAP has only 1. Furthermore, there is not ISS (image subsystem) nor DSS (display subsystem) in DM8168, instead there is a VPSS ASIC.

Articles related

DOMX Repository
Rob Clark Post about gst-ducati
Ducati for dummies
OMAP4 TRM
Source tree used by omappedia
Distributed Codec Engine
GLSDK
RCM
Previous work on kernel 3.2 for DM816x