RidgeRun Metadata/Use Cases/Use Bounding Boxes

Use Case: Video Analysis with Bounding Boxes (SEI)
In computer vision applications such as object detection and video analytics, models often output bounding boxes, class labels, and confidence scores per frame. To transport this metadata without modifying the video frames and without using a parallel data stream, a widely adopted method is to embed the metadata in SEI (Supplemental Enhancement Information) .
Applicable RidgeRun Products SEI
The GstSEIMetadata plugin provides a robust solution for inserting and extracting metadata in H.264/H.265 video streams, enabling the synchronization of additional information—such as bounding boxes—directly with each video frame. Through its seiinject element, bounding boxes can be embedded as metadata using properties, binary data, or GstMeta structures. The seiextract element then retrieves this metadata from the bitstream and reattaches it as GstMeta, making it available for downstream pipeline elements or external applications. Additionally, seimetatimestamp inserts precise timestamps as metadata, helping maintain temporal correlation between detected objects and their corresponding frames. This capability is especially valuable in computer vision and real-time analytics systems, where ensuring that bounding boxes remain tightly synchronized with the video—even through encoding and transmission—is critical.
Code Example
This Python script creates a real-time GStreamer pipeline that generates a test video using videotestsrc, encodes it with x264enc, injects metadata using the seiinject element, and streams it over UDP in RTP H.264 format. Every 500 milliseconds, it builds and updates a JSON payload simulating object detections (a “person” and a “car”) with bounding boxes, provided both in pixel coordinates and normalized format. These detections oscillate slightly to simulate motion.
The JSON is serialized into a string and dynamically assigned to the metad-string property of seiinject, embedding the metadata directly into the video stream as SEI (Supplemental Enhancement Information) messages.
The example also includes a receiver pipeline using udpsrc to capture the RTP H.264 stream, and extract the embedded SEI metadata, allowing verification of the transmitted detection values in real time.
Sender: Python script that generates synthetic H.264 video with videotestsrc and inserts JSON metadata (bounding boxes) every 500 ms using the seiinject element. The metadata simulates two objects (person and car) with slight movement and is sent via RTP/UDP to 127.0.0.1:5000.
#!/usr/bin/env python3
import sys
import signal
import json
import time
import gi
gi.require_version('Gst', '1.0')
from gi.repository import Gst, GLib
WIDTH = 640
HEIGHT = 360
FRAMERATE = 30
# Smooth box movement: oscillates ±5 pixels in X and Y
DELTA = 5
INTERVAL_MS = 500
OFFSETS = list(range(-DELTA, DELTA + 1)) # [-5..+5]
PIPELINE_DESCRIPTION = (
f"videotestsrc is-live=true ! "
f"video/x-raw,width={WIDTH},height={HEIGHT},framerate={FRAMERATE}/1 ! "
"x264enc tune=zerolatency key-int-max=30 insert-vui=true ! "
"seiinject name=inject ! "
"rtph264pay pt=96 config-interval=1 ! "
"udpsink host=127.0.0.1 port=5000"
)
def clamp(v, lo, hi):
return max(lo, min(hi, v))
def norm01(v): return max(0.0, min(1.0, v))
def norm_bbox(x, y, w, h, img_w, img_h):
return {
"x": norm01(x / img_w),
"y": norm01(y / img_h),
"w": norm01(w / img_w),
"h": norm01(h / img_h),
}
def build_bboxes_payload(frame_idx: int, offset_idx: int):
dx = OFFSETS[offset_idx]
dy = OFFSETS[-(offset_idx + 1)] # counter step so they don’t move identically
# Box 1 (person)
x1, y1, w1, h1 = 80 + dx, 60 + dy, 120, 200
x1 = clamp(x1, 0, WIDTH - 1)
y1 = clamp(y1, 0, HEIGHT - 1)
w1 = clamp(w1, 1, WIDTH - x1)
h1 = clamp(h1, 1, HEIGHT - y1)
# Box 2 (car)
x2, y2, w2, h2 = 360 - dx, 140 - dy, 180, 120
x2 = clamp(x2, 0, WIDTH - 1)
y2 = clamp(y2, 0, HEIGHT - 1)
w2 = clamp(w2, 1, WIDTH - x2)
h2 = clamp(h2, 1, HEIGHT - y2)
ts_ms = int(time.time() * 1000)
payload = {
"type": "detections",
"format": "bbox-json",
"timestamp_ms": ts_ms,
"frame": frame_idx,
"image_size": {"width": WIDTH, "height": HEIGHT},
"detections": [
{
"id": 1,
"label": "person",
"score": 0.92,
"bbox": {"x": x1, "y": y1, "w": w1, "h": h1},
"bbox_norm": norm_bbox(x1, y1, w1, h1, WIDTH, HEIGHT),
},
{
"id": 2,
"label": "car",
"score": 0.88,
"bbox": {"x": x2, "y": y2, "w": w2, "h": h2},
"bbox_norm": norm_bbox(x2, y2, w2, h2, WIDTH, HEIGHT),
},
],
}
return payload
class GstBBoxesDemo:
def __init__(self):
self.pipeline = None
self.seiinject = None
self.loop = None
self.tick_id = None
self.bus = None
self.offset_idx = 0
self.frame_idx = 0
def create_pipeline(self):
self.loop = GLib.MainLoop()
try:
self.pipeline = Gst.parse_launch(PIPELINE_DESCRIPTION)
except GLib.Error as e:
print(f"Unable to build pipeline: {e.message}", file=sys.stderr)
return False
# Get pipeline elements
self.seiinject = self.pipeline.get_by_name("inject")
if not self.seiinject:
print("Could not get seiinject element", file=sys.stderr)
return False
# Configure the bus to handle messages (EOS, ERROR, etc.)
self.bus = self.pipeline.get_bus()
self.bus.add_signal_watch()
self.bus.connect("message", self.on_bus_message)
# Ctrl+C
def _sigint_handler(*_args):
print("Interrupted, exiting…")
self.quit()
signal.signal(signal.SIGINT, _sigint_handler)
# Schedule update every 500 ms
self.tick_id = GLib.timeout_add(INTERVAL_MS, self._tick_update_metadata)
return True
def _tick_update_metadata(self):
try:
# Build JSON with bounding boxes
payload = build_bboxes_payload(self.frame_idx, self.offset_idx)
pkt = json.dumps(payload, separators=(",", ":"), ensure_ascii=False)
# Set property
self.seiinject.set_property("metadata", pkt)
# Friendly log
dets = payload["detections"]
summary = "; ".join(
f"{d['label']}#{d['id']} "
f"[{d['bbox']['x']},{d['bbox']['y']},{d['bbox']['w']},{d['bbox']['h']}]"
for d in dets
)
print(f"[tick] frame={self.frame_idx} detections={len(dets)} {summary} JSON={pkt[:64]!r}... (len={len(pkt)})")
# Advance indices
self.offset_idx = (self.offset_idx + 1) % len(OFFSETS)
self.frame_idx += int((INTERVAL_MS / 1000.0) * FRAMERATE)
except Exception as e:
print(f"Error updating metadata: {e}", file=sys.stderr)
return True # keep the timeout active
def on_bus_message(self, bus, message):
t = message.type
if t == Gst.MessageType.EOS:
print("End of stream")
self.quit()
elif t == Gst.MessageType.ERROR:
err, debug = message.parse_error()
print(f"Error: {err.message}", file=sys.stderr)
if debug:
print(f"Debug info: {debug}", file=sys.stderr)
self.quit()
def start(self):
self.pipeline.set_state(Gst.State.PLAYING)
print("Running… (Bounding Boxes JSON via SEI, update every 500 ms)")
self.loop.run()
def quit(self):
if self.tick_id is not None:
GLib.source_remove(self.tick_id)
if self.bus:
try:
self.bus.remove_signal_watch()
except Exception:
pass
if self.loop is not None and self.loop.is_running():
self.loop.quit()
self.stop()
self.release()
def stop(self):
if self.pipeline is not None:
self.pipeline.set_state(Gst.State.NULL)
def release(self):
self.seiinject = None
self.pipeline = None
self.loop = None
self.tick_id = None
def main():
Gst.init(None)
demo = GstBBoxesDemo()
if not demo.create_pipeline():
sys.exit(1)
print("Playing pipeline")
demo.start()
print("Returned, stopping playback")
if __name__ == "__main__":
main()
Receiver: pipeline that receives the RTP/H.264 stream, depayloads, parses, and extracts SEI metadata using seiextract. The extracted JSON is printed in hex format using GST_DEBUG=*seiextract*:MEMDUMP.
GST_DEBUG=*seiextract*:MEMDUMP gst-launch-1.0 udpsrc port=5000 ! "application/x-rtp,media=video,clock-rate=90000,encoding-name=H264" ! rtph264depay ! h264parse ! seiextract ! fakesink
Output: JSON metadata with timestamp, frame number, image size, and two bounding boxes is printed every ~500 ms.
The extracted data is:
<seiextract0>ESC[00m 00000000: 7b 22 74 79 70 65 22 3a 22 64 65 74 65 63 74 69 {"type":"detecti
<seiextract0>ESC[00m 00000010: 6f 6e 73 22 2c 22 66 6f 72 6d 61 74 22 3a 22 62 ons","format":"b
<seiextract0>ESC[00m 00000020: 62 6f 78 2d 6a 73 6f 6e 22 2c 22 74 69 6d 65 73 box-json","times
<seiextract0>ESC[00m 00000030: 74 61 6d 70 5f 6d 73 22 3a 31 37 35 37 30 32 36 tamp_ms":1757026
<seiextract0>ESC[00m 00000040: 34 35 35 33 35 35 2c 22 66 72 61 6d 65 22 3a 30 455355,"frame":0
<seiextract0>ESC[00m 00000050: 2c 22 69 6d 61 67 65 5f 73 69 7a 65 22 3a 7b 22 ,"image_size":{"
<seiextract0>ESC[00m 00000060: 77 69 64 74 68 22 3a 36 34 30 2c 22 68 65 69 67 width":640,"heig
<seiextract0>ESC[00m 00000070: 68 74 22 3a 33 36 30 7d 2c 22 64 65 74 65 63 74 ht":360},"detect
<seiextract0>ESC[00m 00000080: 69 6f 6e 73 22 3a 5b 7b 22 69 64 22 3a 31 2c 22 ions":[{"id":1,"
<seiextract0>ESC[00m 00000090: 6c 61 62 65 6c 22 3a 22 70 65 72 73 6f 6e 22 2c label":"person",
<seiextract0>ESC[00m 000000a0: 22 73 63 6f 72 65 22 3a 30 2e 39 32 2c 22 62 62 "score":0.92,"bb
<seiextract0>ESC[00m 000000b0: 6f 78 22 3a 7b 22 78 22 3a 37 35 2c 22 79 22 3a ox":{"x":75,"y":
<seiextract0>ESC[00m 000000c0: 36 35 2c 22 77 22 3a 31 32 30 2c 22 68 22 3a 32 65,"w":120,"h":2
<seiextract0>ESC[00m 000000d0: 30 30 7d 2c 22 62 62 6f 78 5f 6e 6f 72 6d 22 3a 00},"bbox_norm":
<seiextract0>ESC[00m 000000e0: 7b 22 78 22 3a 30 2e 31 31 37 31 38 37 35 2c 22 {"x":0.1171875,"
<seiextract0>ESC[00m 000000f0: 79 22 3a 30 2e 31 38 30 35 35 35 35 35 35 35 35 y":0.18055555555
<seiextract0>ESC[00m 00000100: 35 35 35 35 35 35 2c 22 77 22 3a 30 2e 31 38 37 555555,"w":0.187
<seiextract0>ESC[00m 00000110: 35 2c 22 68 22 3a 30 2e 35 35 35 35 35 35 35 35 5,"h":0.55555555
<seiextract0>ESC[00m 00000120: 35 35 35 35 35 35 35 36 7d 7d 2c 7b 22 69 64 22 55555556}},{"id"
<seiextract0>ESC[00m 00000130: 3a 32 2c 22 6c 61 62 65 6c 22 3a 22 63 61 72 22 :2,"label":"car"
<seiextract0>ESC[00m 00000140: 2c 22 73 63 6f 72 65 22 3a 30 2e 38 38 2c 22 62 ,"score":0.88,"b
<seiextract0>ESC[00m 00000150: 62 6f 78 22 3a 7b 22 78 22 3a 33 36 35 2c 22 79 box":{"x":365,"y
<seiextract0>ESC[00m 00000160: 22 3a 31 33 35 2c 22 77 22 3a 31 38 30 2c 22 68 ":135,"w":180,"h
<seiextract0>ESC[00m 00000170: 22 3a 31 32 30 7d 2c 22 62 62 6f 78 5f 6e 6f 72 ":120},"bbox_nor
<seiextract0>ESC[00m 00000180: 6d 22 3a 7b 22 78 22 3a 30 2e 35 37 30 33 31 32 m":{"x":0.570312
<seiextract0>ESC[00m 00000190: 35 2c 22 79 22 3a 30 2e 33 37 35 2c 22 77 22 3a 5,"y":0.375,"w":
<seiextract0>ESC[00m 000001a0: 30 2e 32 38 31 32 35 2c 22 68 22 3a 30 2e 33 33 0.28125,"h":0.33
<seiextract0>ESC[00m 000001b0: 33 33 33 33 33 33 33 33 33 33 33 33 33 33 7d 7d 33333333333333}}
<seiextract0>ESC[00m 000001c0: 5d 7d ]}