RidgeRun Metadata/Streaming Protocols/SRT

Follow Us On

RidgeRun Metadata

Table of Contents [Sticky]

Introduction (SRT)

Secure Reliable Transport (SRT) is a modern low-latency streaming protocol designed for real-time contribution and remote production. It is built on top of UDP but adds mechanisms that UDP alone does not provide, including:

ARQ (Automatic Repeat reQuest): retransmissions on demand (NACK-based).
Jitter buffering and resequencing: smooths variable delay and restores order of packets.
Congestion control: dynamically adapts to available bandwidth.
Encryption: optional AES-128/192/256 for content protection.

SRT is widely used in live production where glass-to-glass delay matters. Its design keeps the efficiency of UDP but solves its key weaknesses (loss, jitter, reordering), giving users a tunable latency target in the tens to hundreds of milliseconds range.

Key Characteristics

UDP-based + ARQ: Uses selective retransmissions (NACK-driven) within a latency window. Unlike TCP, there is no head-of-line blocking.
Configurable latency: Sender and receiver maintain latency buffers (e.g., 60–200 ms) to absorb jitter and allow late packets to be recovered.
Jitter handling: SRT resequences out-of-order packets and fills gaps using retransmitted packets if they arrive before the deadline.
Congestion control: Bandwidth estimation algorithms adapt to changing network conditions (similar to BBR/UDT).
Encryption: AES-128/192/256 using passphrases and key length (`pbkeylen`).
Connection modes: caller, listener, and rendezvous modes allow flexible NAT/firewall traversal.

Benefits and Limitations

Benefits

Low end-to-end latency — Configurable in the 50–200 ms range for stable links. Lower values can be achieved on clean networks.
Resilience to loss and jitter — ARQ and jitter buffers allow smooth playback even with 5–10% packet loss bursts.
Integrated security — AES encryption without needing external VPN or tunnels.
Firewall/NAT friendly — Rendezvous mode enables connections through symmetric NATs with a single UDP flow.
Broad ecosystem support — Implemented in open-source (Haivision, SRT Alliance, GStreamer, FFmpeg, OBS, VLC).

Limitations

Latency vs recovery trade-off — Higher loss requires larger latency buffers (increasing end-to-end delay).
Not suitable for file transfer — SRT is optimized for real-time media, not bulk file delivery.
Endpoint requirement — Both sender and receiver must support SRT (not playable in a standard web browser).
Tuning required — Incorrect latency configuration can cause drops if too small, or excess delay if too large.

RidgeRun compatible products

RidgeRun products that support metadata streaming over SRT include:

SEI: Metadata is embedded as SEI NAL units in H.264/H.265 streams and transported over SRT (typically inside MPEG-TS). Works seamlessly with seiinject and seiextract.
OBU: AV1 metadata OBUs are preserved when encapsulated in MPEG-TS or Matroska and carried over SRT. Suitable for next-generation AV1 workflows.
MPEG-TS: Native metadata streams (private PES, descriptors, or KLV) can be multiplexed together with audio/video in MPEG-TS and transported via SRT. Some players can extract standardized metadata directly.

Usage Implications (SRT + Metadata)

Method	Description	Implications
SRT + SEI (H.264/H.265)	SEI NAL units embedded in the video bitstream; carried over SRT (typically inside MPEG-TS).	Benefits Frame-accurate: Metadata aligned with frames/IDR. Works with existing tools: `seiinject` / `seiextract`. Standards-friendly: Supported in H.264/H.265 + TS workflows. Limitations Requires custom parsing: Receiver must explicitly parse SEI metadata. May be stripped: Re-encoding or middleboxes may remove SEI unless explicitly preserved.
SRT + OBU (AV1)	AV1 metadata OBUs preserved over SRT when encapsulated (e.g., MPEG-TS or Matroska) and not transcoded.	Benefits In-band: Metadata embedded directly in AV1 stream. Future-ready: Suitable for next-gen AV1 workflows. Limitations Limited support: Few players currently support AV1 + OBU. Resource-intensive: AV1 encode/decode is CPU-heavy.
SRT + MPEG-TS (native metadata)	Private PES, descriptors, or KLV encapsulated in MPEG-TS and transported over SRT.	Benefits Interoperable: Standard approach in broadcast workflows. Unified stream: Multiplex audio, video, and metadata together. Player extraction: Some players can extract standardized metadata (e.g., KLV) without custom software. Limitations Higher overhead: TS encapsulation adds payload size. Requires demultiplexing: Metadata must be separated with `tsdemux`. Loose association: Metadata is not inherently tied to video frames; synchronization relies on timestamps.

Example pipelines using SRT

1) SEI over SRT (H.264)

This example injects metadata into H.264 as SEI NAL units and streams it over SRT (encapsulated in MPEG-TS). On the receiver side, the pipeline demultiplexes the TS and extracts the SEI to recover the metadata.

Sender (inject)

gst-launch-1.0 -e   videotestsrc is-live=true ! x264enc tune=zerolatency key-int-max=60 !   seiinject metadata="HELLO WORLD" ! mpegtsmux !   srtsink uri="srt://127.0.0.1:9000?mode=listener&latency=120" -v

Receiver (extract)

GST_DEBUG=*sei*:MEMDUMP gst-launch-1.0 -e   srtsrc uri="srt://127.0.0.1:9000?mode=caller&latency=120" !   tsdemux ! seiextract ! fakesink silent=false -v

Output (expected)

0:00:08.957161209 48157 0x580c62870cc0 MEMDUMP           seiextract gstseiextract.c:299:gst_sei_extract_extract_h264_data:<seiextract0> ---------------------------------------------------------------------------
0:00:08.957167675 48157 0x580c62870cc0 MEMDUMP           seiextract gstseiextract.c:299:gst_sei_extract_extract_h264_data:<seiextract0> The extracted data is: 
0:00:08.957174464 48157 0x580c62870cc0 MEMDUMP           seiextract gstseiextract.c:299:gst_sei_extract_extract_h264_data:<seiextract0> 00000000: 48 45 4c 4c 4f 20 57 4f 52 4c 44 00              HELLO WORLD.    
0:00:08.957179354 48157 0x580c62870cc0 MEMDUMP           seiextract gstseiextract.c:299:gst_sei_extract_extract_h264_data:<seiextract0> ---------------------------------------------------------------------------

2) AV1 + OBU metadata over SRT

This example injects metadata as AV1 OBUs and streams it over SRT (encapsulated in MPEG-TS). On the receiver side, the pipeline demultiplexes the TS, parses AV1, and extracts the OBU metadata.

Sender (inject OBU)

gst-launch-1.0 -e \
  videotestsrc is-live=true num-buffers=15 ! \
  "video/x-raw,width=320,height=240" ! \
  av1enc cpu-used=8 ! \
  obuinject metadata="Hello OBU World" ! \
  av1parse ! mpegtsmux alignment=7 name=mux ! \
  srtsink uri="srt://127.0.0.1:9000?mode=listener&latency=120&transtype=live" -v

Receiver (extract OBU)

GST_DEBUG=*obu*:MEMDUMP gst-launch-1.0 -e -v \
  srtsrc uri="srt://127.0.0.1:9000?mode=caller&latency=120" ! \
  tsdemux ! av1parse ! obuextract ! fakesink silent=false

Output (expected)

0:00:03.412583214  98765 0x55c3b3a2e1e0 MEMDUMP           obuextract gstobuextract.c:286:gst_obu_extract_prepare_output_buffer:<obuextract0> -----------------------------
0:00:03.412592133  98765 0x55c3b3a2e1e0 MEMDUMP           obuextract gstobuextract.c:286:gst_obu_extract_prepare_output_buffer:<obuextract0> The extracted data is
0:00:03.412600942  98765 0x55c3b3a2e1e0 MEMDUMP           obuextract gstobuextract.c:286:gst_obu_extract_prepare_output_buffer:<obuextract0> 00000000: 48 65 6c 6c 6f 20 57 6f 72 6c 64                 Hello OBU World
0:00:03.412604499  98765 0x55c3b3a2e1e0 MEMDUMP           obuextract gstobuextract.c:286:gst_obu_extract_prepare_output_buffer:<obuextract0> -----------------------------

Examples: Sending Metadata over SRT

This example shows how to send and receive metadata using RidgeRun’s metasrc / metasink elements with mpegtsmux over SRT. Two terminals are required: one for the sender (Pipe 1) and one for the receiver (Pipe 2).

1) Video stream without metadata

Pipe 1 (Sender)

gst-launch-1.0 -v videotestsrc is-live=true ! \
  x264enc key-int-max=30 tune=zerolatency ! \
  mpegtsmux ! \
  srtsink uri="srt://127.0.0.1:8888?mode=listener"

Pipe 2 (Receiver)

GST_DEBUG=WARNING gst-launch-1.0 -v \
  srtsrc uri="srt://127.0.0.1:8888?mode=caller" ! \
  tsdemux ! \
  h264parse ! decodebin ! queue ! videoconvert ! xvimagesink -v

This setup transmits only video. You should see a test pattern displayed in a video window on the receiver side.

---

2) Metadata example

This example demonstrates sending periodic metadata buffers alongside video using RidgeRun’s metasrc element, multiplexed in MPEG-TS and delivered over SRT. On the receiver side, metasink extracts and prints the metadata.

Pipe 1 (Sender with metadata)

GST_DEBUG=WARNING gst-launch-1.0 \
  metasrc metadata="HELLO SRT WORLD" period=1 ! \
  mpegtsmux ! \
  srtsink uri="srt://127.0.0.1:8888?mode=listener" -v

Pipe 2 (Receiver with metasink)

GST_DEBUG=WARNING gst-launch-1.0 -v \
  srtsrc uri="srt://127.0.0.1:8888?mode=caller" ! \
  tsdemux ! \
  metasink -v

Expected Output (excerpt)

On the receiver side (Pipe 2) you should see log lines like:

0:00:02.134567891  12345 0x55a6f3f10 DEBUG metasink Received metadata buffer
0:00:02.134572134  12345 0x55a6f3f10 INFO  metasink payload = "HELLO SRT WORLD"

UDP vs SRT (Comparison)

This section compares raw UDP and SRT in the context of live video streaming with metadata. The following table highlights differences in how each protocol handles packet loss, jitter, latency, security, and NAT traversal, helping determine which transport is more suitable for professional workflows.

Aspect	UDP (raw)	SRT (built on UDP)
Loss handling	None → packets lost are unrecoverable (visual artifacts, glitches).	ARQ (NACK) + jitter buffer → packet recovery within latency window.
Latency under loss	Spikes, freezes, macroblocking when loss occurs.	Stable latency (e.g., 120 ms) as long as retransmissions arrive before the buffer deadline.
Reordering/Jitter	Not handled; late or reordered packets are dropped.	Handled by resequencing and jitter buffer.
Security	None.	AES-128/192/256 with passphrase-based key exchange.
Firewall/NAT traversal	Manual port opening, often blocked by NAT/firewalls.	Caller/Listener/Rendezvous modes allow connections through NAT/firewall.

Why use SRT instead of UDP?

SRT is particularly effective when you need:

Predictable low latency in live streaming or remote production.
Robustness to network impairments such as jitter and packet loss.
Secure contribution without adding extra overhead from VPNs.
Cross-NAT connectivity without complex network setups.

While raw UDP can be sufficient on pristine networks, it fails under real-world conditions (loss, jitter). SRT provides continuity, stability, and professional-grade resilience at the cost of a small, controlled latency buffer.

SRT vs UDP – Low-Latency and Loss Resilience Demo

This demo shows how SRT maintains smooth playback with low latency under packet loss and jitter, while UDP degrades noticeably. You will run four pipelines (SRT sender/receiver and UDP sender/receiver) and then inject impairments using tc netem.

1) Launch the SRT pair

Receiver (listener, ~50 ms target latency)

gst-launch-1.0 -e \
  srtsrc uri="srt://:9000?mode=listener&latency=50" \
  ! queue max-size-time=2000000000 \
  ! tsdemux name=demux \
  ! h264parse \
  ! avdec_h264 \
  ! fpsdisplaysink video-sink=autovideosink sync=true text-overlay=false

Sender (caller)

gst-launch-1.0 -e \
  videotestsrc is-live=true pattern=18 \
  ! video/x-raw,framerate=30/1 \
  ! x264enc tune=zerolatency speed-preset=ultrafast key-int-max=30 bitrate=4000 \
  ! h264parse config-interval=-1 \
  ! mpegtsmux alignment=7 name=mux \
  ! srtsink uri="srt://127.0.0.1:9000?mode=caller&latency=50&transtype=live"

2) Launch the UDP pair (baseline)

Receiver

gst-launch-1.0 -e \
  udpsrc port=5000 caps="application/x-rtp,media=video,encoding-name=H264,clock-rate=90000,payload=96" \
  ! rtph264depay \
  ! h264parse \
  ! avdec_h264 \
  ! fpsdisplaysink video-sink=autovideosink sync=true text-overlay=false

Sender

gst-launch-1.0 -e \
  videotestsrc is-live=true pattern=18 \
  ! video/x-raw,framerate=30/1 \
  ! x264enc tune=zerolatency speed-preset=ultrafast key-int-max=30 bitrate=4000 \
  ! h264parse config-interval=1 \
  ! rtph264pay pt=96 \
  ! udpsink host=127.0.0.1 port=5000

3) Impair the network (simulate loss and jitter)

Run this while the four pipelines are playing (using the loopback device in this local demo):

# Add 5% loss + 50 ms delay ±10 ms jitter (normal distribution)
sudo tc qdisc add dev lo root netem loss 5% delay 50ms 10ms distribution normal

To restore normal conditions:

sudo tc qdisc del dev lo root

Expected behavior

When the four pipelines are running you will see two video windows (one for SRT and one for UDP), each showing the white ball moving in a circle (pattern 18):

Left window = SRT (srtsrc) — playback remains much more stable, with continuous motion and steady FPS despite loss/jitter (recovered within the 50 ms latency window).
Right window = UDP (udpsrc) — you will observe frame drops, freezes, or macroblocking under the same impairments because RTP/UDP has no loss recovery.

As shown in the following animation, the left window (SRT) remains stable while the right window (UDP) freezes and drops frames under 5% loss and jitter.

SRT (left) stable playback vs UDP (right) unstable playback

Notes & tips

You can tune latency by changing the latency query (e.g., 80–200 ms) to trade recovery depth for added buffering.
fpsdisplaysink shows instantaneous FPS; expect SRT to keep FPS closer to 30 under loss, while UDP dips noticeably.
If you test across different hosts, replace 127.0.0.1 with the receiver’s IP and apply tc netem on the egress/ingress NIC instead of lo.
Keep tune=zerolatency and a small GOP (key-int-max=30) to minimize encoder buffering.

References

❯

Share This Page