RidgeRun GStreamer Analytics Example: System Failure Debug

From RidgeRun Developer Wiki


Follow Us On Twitter LinkedIn Email Share this page





Example: Simulating Unexpected Failures with Error Injection

In real-world deployments, pipelines may terminate abruptly due to process crashes, hardware instability, or unhandled exceptions. These events are often unpredictable, but with RidgeRun GStreamer Analytics, both metrics and logs provide visibility into what happened before the failure. To simulate this scenario, we can use a simple pipeline with an identity element configured to raise an error after a fixed number of buffers.


Pipeline with Error Injection

The following pipeline runs normally for a short time and then terminates when the identity element triggers an error:

RR_PROC_NAME=system-failure-test \ 
GST_DEBUG="DEBUG" \ 
GST_REMOTE_DEBUG="DEBUG" \
GST_TRACERS="rrlogtracer;rrpipelineframerate;rrpipelinebitrate;rrpipelineruntime;rrprocmemusage;rrproccpuusage;rrprociousage" \
gst-launch-1.0 \
  videotestsrc is-live=true \
  ! "video/x-raw,framerate=30/1" \ 
  ! identity error-after=5400 \
  ! fakesink
Table 1. Environment variables definition
Variable Purpose
RR_PROC_NAME Labels the process so it can be easily identified in Grafana dashboards
GST_DEBUG Enables detailed circular logs locally, capturing the precise moment of failure
GST_REMOTE_DEBUG Defines what logs are sent remotely to Grafana Drilldown for analysis.
GST_TRACERS Enables reporting of FPS, bitrate, runtime, and per-process resource usage, making metrics visible in Grafana. Also enables the RidgeRun custom tracer (rrlogtracer) to capture and structure GStreamer logs.

Remote Logging and On-Device Circular Logs

As the pipeline runs, metrics show a process that is active, consuming CPU and memory, and a pipeline runtime that increases steadily.

RidgeRun GStreamer Analytics process and pipeline metrics

Once the error is triggered, the pipeline exits abruptly.

  • Remote logs: The filtered log stream is sent to Grafana Drilldown, where the error message from the identity element appears, allowing engineers to correlate it with the timeline of metrics.
    RidgeRun GStreamer Analytics filtered log analysis of system failure in Grafana Drilldown
  • On-device circular logs: At the same time, full logs are written locally to disk at /var/log/ridgerun by default, ensuring the failure is captured even if the server or network connection is unavailable at the moment of the crash.

Circular Logging as a Reliable Fallback

The circular logging mechanism ensures that valuable debugging information is preserved even in the face of unexpected failures. Logs are continuously written into four rotating files, each maintaining a portion of the runtime history. If the pipeline or system crashes abruptly, these files safeguard the most recent log entries, allowing engineers to reconstruct the sequence of events leading up to the failure. Even if one file becomes corrupted during the crash, the others remain intact and provide valid records.

This design guarantees that engineers have a record of the failure window even if the system loses remote connectivity or crashes unexpectedly.

Logs stored in the device when circular logging is enabled

By combining centralized metrics and logs in Grafana with redundant local logs on the device, this hybrid approach guarantees that critical debugging information remains available under all circumstances.


Closing Remarks

The system failure example demonstrates the resilience of RidgeRun GStreamer Analytics in the face of unexpected crashes. While remote logs provide real-time visibility of the error in Grafana Drilldown, the fragmented circular logging model on the device guarantees that the last runtime messages are preserved, even if one log file becomes corrupted or connectivity is unavailable. This ensures engineers always retain access to the final context before a crash, making post-mortem analysis possible and reliable.