Format detection tools for multimedia files

From RidgeRun Developer Wiki

Introduction

One of the challenges with embedded devices to properly handling file formats that are not supported by the hardware. In an ideal world, you may just want to give it a go and let the error code returned tell you it isn't a supported format. Unfortunately, all to often the hardware accelerated decode will choke on the unsupported format and leave you hanging.

RidgeRun has come up with one way handling this case by checking the header before the video stream/file is passed on for processing.

References

Format detection

Define the format detection as the process to search or parse through a file for a structure that defines the contents of such a file (in general terms, multimedia headers that describes containers, and the different streams present in the file). Also, one must consider that such format may not exist at all, for example; on empty files, corrupted files, the file could have a format that is not expected, or when a tester with an evil intent sends random data to the device. This means that the tool used must consider those specific cases for a complete robust solution. We will consider three possible technologies for format detection:

  • Gstreamer hotplugging
  • Mediainfo
  • FFMPEG

GStreamer approach: Decodebin & typefind

Decodebin

In GStreamer, decodebin is the actual autoplugger backend of playbin. Decodebin will, in short, accept input from a source that is linked to its sinkpad and will try to detect the media type contained in the stream, and set up decoder routines for each of those. Decodebin will automatically select decoders. For each decoded stream, decodebin will emit the “pad-added” signal, to let the client know about the newly found decoded stream. For unknown streams (which might be the whole stream), it will emit the “unknown-type” signal. The application is then responsible for reporting the error to the user. Several versions of decodebin are available regarding how elements are dynamically linked together and to address different scenarios (e.g., decodebin, decodebin2) but the single most important element in autoplugging inside the bin is the typefind, which is the one actually used to determine the format of the stream.

Typefind

When loading a media stream, the type of the stream is not known. This means that before we can choose a pipeline to decode the stream, we first need to detect the stream type. GStreamer uses the concept of typefinding for this. Typefinding is a normal part of a pipeline; it will read data for as long as the type of a stream is unknown. During this period, it will provide data to all plugins that implement a typefinder. When one of the typefinders recognizes the stream, the typefind element will emit a signal and act as a pass through module from that point on. If no type was found, it will emit an error and further media processing will stop.

Once the typefind element has found a type, the application can use this to plug together a pipeline to decode the media stream, as discussed in the next section.

Plugins in GStreamer have the options to implement typefinder functionality. A plugin implementing this functionality will submit a media type, optionally a set of file extensions commonly used for this media type, and fabricate a typefind function. Once this typefind function inside the plugin is called, the plugin will see if the data in this media stream matches a specific pattern that marks the media type identified by that media type. If it does, it will notify the typefind element of this fact, telling which media type was recognized and how certain we are that this stream is indeed that media type. Once this run has been completed for all plugins implementing typefind functionality, the typefind element will tell the application what kind of media stream was recognized.

  • There are a lot of supported formats with typefind, if a format is not found a signal will be posted.
  • You can use the decodebins already packed with GStreamer or create one custom that better suits your needs; using bins and typefinds a lot of elements can be recycled when decoding streams, you can even discard trying to extract raw data from a specific stream and decrease the memory footprint if needed.
  • Typefind will only detect the type of stream when it has a buffer to process. That means that when decoding a stream for example in a container, you need to add a typefind to detect the type of container and depending on the container, a typefind for each sub-stream (audio, video , subs, etc) in order to determine what to plug next.
  • Not knowing the format from the beginning in a embedded system is an issue since demuxing must be performed in order to check if there is a hardware accelerated codec to use it.
  • If you are using the typefind approach in an application, you will not know the actual pipeline that has to be created to decode a file from the beginning or if it's supported at all. This is a limitation if a player has to tell the user beforehand that a specific file will not be supported, for example, you support AAC and H264 in a transport stream container, but you don't support MP2 audio; if the first buffers are just video that means you will create the whole pipeline to decode video, and when it reaches audio will realize that audio was not supported - Yikes!

Mediainfo

MediaInfo is a convenient unified display of the most relevant technical and tag data for video and audio files.

The MediaInfo data display includes:

  • Container: format, profile, commercial name of the format, duration, overall bit rate, writing application and library, title, author, director, album, track number, date, duration...
  • Video: format, codec id, aspect, frame rate, bit rate, color space, chroma subsampling, bit depth, scan type, scan order...
  • Audio: format, codec id, sample rate, channels, bit depth, language, bit rate...
  • Text: format, codec id, language of subtitle...
  • Chapters: count of chapters, list of chapters..

Example output:

# mediainfo TDF2011Stg19_1080i_Episode.m2t
General
ID                                       : 0 (0x0)
Complete name                            : TDF2011Stg19_1080i_Episode.m2t
Format                                   : MPEG-TS
File size                                : 18.4 GiB
Duration                                 : 4h 0mn
Overall bit rate mode                    : Constant
Overall bit rate                         : 11.0 Mbps

Video
ID                                       : 4113 (0x1011)
Menu ID                                  : 1 (0x1)
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4.0
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 2 frames
Codec ID                                 : 27
Duration                                 : 4h 0mn
Bit rate mode                            : Constant
Bit rate                                 : 10 000 Kbps
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Frame rate                               : 29.970 fps
Standard                                 : NTSC
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : MBAFF
Scan order                               : Top Field First
Bits/(Pixel*Frame)                       : 0.161
Stream size                              : 17.1 GiB (93%)
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709
Color range                              : Limited

Audio
ID                                       : 4352 (0x1100)
Menu ID                                  : 1 (0x1)
Format                                   : MPEG Audio
Format version                           : Version 1
Format profile                           : Layer 2
Codec ID                                 : 3
Duration                                 : 4h 0mn
Bit rate mode                            : Constant
Bit rate                                 : 256 Kbps
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 KHz
Compression mode                         : Lossy
Delay relative to video                  : -1ms
Stream size                              : 440 MiB (2%)
  • Mediainfo lasted around 8 seconds parsing the file above (tested using a DM8168 hardware), it seems to take longer depending on the size of file since it gathers more information about the streams.
  • The output format is a bit tricky for parsing since it doesn't follow a block structure, its made for an end user than as a intermediary tool for development.
  • It's already ported to the Ridgerun SDK, libmediainfo and mediainfo.
  • Gives information about the streams and several statistics about each stream (depending on the options used).

FFMPEG: ffprobe

FFmpeg is a multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created. It supports the most formats and is designed to be highly portable.

FFmpeg tools are included in the Ridgerun SDK. One of the most useful tools in terms of format detection is ffprobe.

ffprobe parser

ffprobe gathers information from multimedia streams and prints it in a human and machine readable format. For example, it can be used to check the format of the container used by a multimedia stream and the format and type of each media stream within the container. If a filename is specified as input, ffprobe will try to open and probe the file content. If the file cannot be opened or recognized as a multimedia file, an appropriate exit code is returned.

ffprobe may be employed both as a standalone application or in combination with a textual filter, which may perform more sophisticated processing, e.g. statistical processing or plotting. Options are used to specify which information to display, and for setting how ffprobe will display the information. ffprobe output is designed to be easily parsable by a textual filter. The output consists of one or more sections in a form defined by the selected writer, which is specified by the print_format option. [1]

Usage:

ffprobe [options] [input_file]

An example usage of ffprobe:

# ffprobe -v quiet -show_format -show_streams TDF2011Stg19_1080i_Episode.m2t

[STREAM]
index=0
codec_name=mp2
codec_long_name=MP2 (MPEG audio layer 2)
codec_type=audio
codec_time_base=1/48000
codec_tag_string=[3][0][0][0]
codec_tag=0x0003
sample_fmt=s16
sample_rate=48000
channels=2
bits_per_sample=0
id=0x1100
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/90000
start_time=1.033367
duration=14419.800000
bit_rate=256000
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
[/STREAM]
[STREAM]
index=1
codec_name=h264
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
codec_type=video
codec_time_base=1001/60000
codec_tag_string=[27][0][0][0]
codec_tag=0x001b
width=1920
height=1080
has_b_frames=1
sample_aspect_ratio=1:1
display_aspect_ratio=16:9
pix_fmt=yuv420p
level=40
timecode=N/A
is_avc=0
nal_length_size=0
id=0x1011
r_frame_rate=30000/1001
avg_frame_rate=30000/1001
time_base=1/90000
start_time=1.034367
duration=N/A
bit_rate=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
[/STREAM]
[FORMAT]
filename=TDF2011Stg19_1080i_Episode.m2t
nb_streams=2
format_name=mpegts
format_long_name=MPEG-2 transport stream format
start_time=1.033367
duration=14419.800000
size=19772007188
bit_rate=10969365
[/FORMAT]
  • FFprobe last around 1.5 seconds parsing a file, independently from the size.
  • It has several options as the format of the output file, even JSON, which is useful when parsing the contents.
  • As stated above is portable between architectures.

Comparison chart

Usage Decodebin & TypeFind Mediainfo FFprobe
Detect file container and stream formats. Check if a file is not supported. X X X
Detect file container and stream formats before without having to plug demuxing elements. X X
Processing time independent of the filesize. X X
Can easily create a log for parsing. X X
The format log can be changed to adjust a JSON string. X
Provide approximate statistics about streams. X

Further Reading

There are more tools available for format detection that might be even lighter than the ones used so far (but have more dependencies).