GStreamer and in-band metadata

From RidgeRun Developer Connection
Jump to: navigation, search


As video data is moving though a GStreamer pipeline, it can be convenient to add information related to a specific frame of video, such as the GPS location, in a manner that receivers who understand how to extract metadata can access the GPS location data in a way that keeps it associated with the correct video data. In a similar fashion, if a receiver doesn't understand in-band metadata, the inclusion of such data will no effect the receiver.


MISP Motion Imagery Standards Profile

The key statement in the specification is Within the media container, all metadata must be in SMPTE KLV (Key-Length-Value) format. In MISP, the metadata is tagged with a timestamp so that it can be associated with the right video frame. This is important because GStreamer handles the association slightly differently. To understand the difference, you need to see how MISP combines video frames and metadata so you can compare it to GStreamer. The following diagram is a modified version from the MISBTRM0909 MISP spec.


KLV Key Length Value Metadata

For this discussion, we care about time stamping and transporting KLV data, not what it means. Stated another way, KLV data is any binary data (plus a length indication) that we need to move from one end to the other while keeping the data associated with correct video frame. It is up to the user of the video encoding stream and the user of the video decoding stream to understand the meaning and encoding of the KLV data.

To give a concrete KLV encoding example, here is a terse description of the SMPTE 336M-2007 Data Encoding Protocol Using Key-Length Value, which is used by MISB standard.

Key Length Value

Fixed length (1, 2, 4, or 16 bytes), size know to both sender and receiver, encoding the key. There are very specific rules on how keys are encoded and how both the sender and receiver know the meaning of the encoded key.

Fixed or variable length (1, 2, 4, or BER) indication of the number of bytes of data used to encode the value.

Variable length value whose meaning is agreed to by both the sender and the receiver.

As an example (from Wikipedia KLV entry),

Key Length Value
42 2 0 3

Which could be passed in as a 4 byte binary blob of 0x2A 0x02 0x00 0x03. The transport of the KLV doesn't need to know the actual encoding, just that it is 4 bytes long and the actual KLV data.

As another example (not MISB compliant), you could have the length be 8 and the data be 0x46 0x4F 0x4F 0x3D 0x42 0x41 0x52 0x00, which works out to be the NULL terminated ASCII string FOO=BAR. The transport doesn't care about the encoding, just so both the sending and receiver are in agreement.

Time stamps

In addition to being able to provide out-of-band information from the sender to the receiver, the information includes a timestamp that allows the data to maintain a time relationship with the video frames that also include a timestamp. Both the metadata and the video frame timestamps are generated by the same source clock.

Since both the flow of the metadata and the flow of the video frames can be viewed as data streaming though a pipe, the maximum accuracy in maintaining the time relationship between the two is for both metadata and video frames be assigned a timestamp value as soon as the data is generated. Any delay or variability in associating the timestamp with either the video frames or the metadata will add error to the time relationship.

MPEG-2 Transport Stream

The MPEG-2 Transport Stream protocol adds a TS header to video data, audio data, and metadata. The video data, audio data, and metadata are termed elementary streams. The TS header follow by data is called a packet. The TS header allows the receiving side to use the TS header PID (Packet ID) field to demultiplex the elementary streams. There are many other fields in a TS header beside the PID field.

For this discussion the important point is the Transport Stream protocol definition already supports of the notion of including timestamped metadata in a transport stream.

Metadata and GStreamer

GStreamer models streaming audio/video/data as moving though a pipeline from source to sink. Adding support for metadata involves adding a new metadata source element and a new sink pad to the transport stream multiplexer element.


A simplified textural representation of the pipeline would be:

gstlaunch v4l2src ! dmaienc_h264 ! mux. \
          alsasrc ! dmaienc_aac ! mux.  \
          metasrc ! queue !             \
                            mpegtsmux name=mux mux. ! rtpmp2tpay ! udpsink port=5004 host=$HOST

A decorated pipeline for DM36x would be:

gst-launch v4l2src queue-size=6 always-copy=FALSE input-src=composite chain-ipipe=true ! capsfilter caps=video/x-raw-yuv,format=\(fourcc\)NV12,width=640,height=480 ! dmaiaccel ! \
dmaienc_h264 name=video_encoder targetbitrate=1000000 idrinterval=90 intraframeinterval=30 ratecontrol=2 encodingpreset=2 ! queue !  mux. alsasrc buffer-time=800000 latency-time=30000 ! \
dmaiperf ! capsfilter caps=audio/x-raw-int,channels=1,width=16,depth=16,rate=16000 ! dmaienc_aac name=aac_encode outputBufferSize=131072 maxbitrate=64000 bitrate=32000 ! queue ! mux. \
metasrc ! queue ! mpegtsmux name=mux mux. ! rtpmp2tpay ! udpsink port=5004 host=$HOST

GStreamer tags

GStreamer supports an event called a tag. When an element receives a tag it doesn't understand, it simply passes it downstream. Tags are either independent of the stream encoding (like the title of the song for an audio stream) or information that effects how the stream is processed (like the stream bitrate).

Integrating patches for metadata support

In order to enable GStreamer to support metadata there are some patches that have to be applied; the following tables summarizes the patches that need to be applied and their respective open source package destination. It is important to mention that these patches are for specific 0.10 version of GStreamer and it have not been tested on any other versions of GStreamer.

Patch Name Package to patch
collectpads-waiting-state-backport.patch gstreamer-0.10.36
metadata-support-mpegts.patch gst-plugins-bad-0.10.23

After applying the patches, rebuilding and installing the packages you can verify that the changes were sucessfully applied by running the following command


#gst-inspect mpegtsmux

Factory Details:
Long name: MPEG Transport Stream Muxer
Class: Codec/Muxer
Description: Multiplexes media streams into an MPEG Transport Stream
Author(s): Fluendo <>
Rank: primary (256) 


Pad Templates:
SINK template: 'metadata_%d'
Availability: On request
Has request_new_pad() function: 0x2b17ed8c
standard: { klv } 


standard: { klv } 


#gst-inspect mpegtsdemux

Factory Details: Long name: 
The Fluendo MPEG Transport stream demuxer 
Class: Codec/Demuxer 
Description: Demultiplexes MPEG2 Transport Streams 
Author(s): Wim Taymans <>
Rank: primary (256) 


SRC template: 'metadata_%04x'
Availability: Sometimes
standard: { klv } 


Integrating the metasource-plugin

The patches applied above only enable the element mpegtsmux to accept and stream the metadata with the other kinds of data, like video or audio, altogether; but in order to inject metadata to the pipeline directly is needed the metasource plugin, this element allows to inject any kind of metadata to the pipeline (i.e., the system time, strings, etc).

So, to be able to run the demo correctly and inject metadata to the pipeline the metasource plugin has to be installed into the software that is running into the target device.

Running the Demo

With the demo it is included a README file with the main instructions to run the demo, also if you need any help you can run the script with the -h option or the --help option for more information (i.e., ./bin/gst-metadata --help); also you can contact for futher information.

Pipelines examples

Testing the metadata plugins can be done by running some pipelines into the same board, this can be done by using two terminals and using one as the server and the other one as the client. The next pipelines will stream video and the metadata captured with the keyboard. It is important to mention that the terminal that is running as the server has to be the one opened with the command termnet, so the capture of the keyboard is done correctly; the other terminal can be running by a ssh service or a telnet service.

PORT = 5000 HOST =


PORT = 5000

gst-launch mpegtsmux name=mux videotestsrc ! vpuenc codec=6 ! h264parse ! queue ! mux. filesrc location=/dev/ttymxc0 blocksize=1 ! application/x-metadata,standard=klv ! \
mux.metadata_%d mux. ! udpsink host=$HOST port=$PORT


PORT = 5000

gst-launch -v udpsrc port=$PORT ! mpegtsdemux name=demux demux. ! queue ! vpudec framedrop=false ! mfw_isink sync=false async=false enable-last-buffer=false qos=false \
demux. ! queue ! application/x-metadata,standard=klv ! fakesink enable-last-buffer=false qos=false async=false dump=true

Known issues and limitations

  • In order to run the demo it is important to first start the client service and then run the server service. This is because of an issue of the iMX6 plugins.