GstCUDA - cudamux
This page describes in detail the cudamux element of the GstCUDA plugin.
Description
Cudamux is a multiple inputs/single output pad video filter GStreamer element, that allows video frames to be processed by the GPU using a custom CUDA library algorithm. With this element users can now develop their own CUDA processing library, pass the library into cudamux, which executes the library on the GPU, passing upstream frames from the GStreamer pipeline for each input pad to the GPU and passing the modified frames downstream to the next element in the GStreamer pipeline.
This element executes the CUDA algorithm from a custom CUDA library (XXX.so file) loaded dynamically during run-time, passed trough an element's property. The CUDA algorithm is separated from the GStreamer element, so the developer could make modifications to the CUDA algorithm, recompile the custom CUDA library and run the GStreamer pipeline again to test the changes. This process can be iterated as many times as needed to debug a custom CUDA algorithm. This feature makes cudamux ideal for quick prototyping because it offers flexibility and adaptability to many project requirements.
One key feature of this element is the capability to load the CUDA algorithm to be executed on the GPU to process the incoming frames, from an external compiled custom CUDA library. This gives the advantage of having the GStreamer element separated from the CUDA algorithm. So, the developer doesn't have to worry about the GStreamer-CUDA interface and complex memory handling, because the cudamux will take care of that. Instead, the developer can be focused on the custom CUDA algorithm development, and test any change made during the debugging process by just recompiling the CUDA library and just execute the GStreamer pipeline again without the necessity to modify, recompile and reinstall the GstCUDA plugin. This feature is crucial in reducing the time to market on project development because considerably accelerates the prototyping stage.
Another crucial feature of cudamux is the multiple input/single output pads filter element topology. This feature makes this element very flexible and adaptable to many project requirements. This element has one "Always" source pad and multiple "On request" sink pads. The user is responsible to request the number of sink pads as many inputs are required by the custom CUDA algorithm. Because this is quick prototyping intend element, it will not be aware of errors committed by the user related to a mismatch in the number of requested sink pads and the number of inputs required by the custom CUDA algorithm. The cudamux element will generate an array of inputs based on the number of "On requested" sink pads and pass it to the custom CUDA algorithm, accordingly to the expected template of the custom CUDA library. So, for this reason is very important that the user be aware to match the number of requested sink pads with the number of inputs defined in the custom CUDA library to avoid an error.
The cudamux with its multiple inputs/single-output (MISO) topology, becomes the best option for quick prototyping projects that wants to interface GStreamer with a CUDA algorithm that requires several inputs and one output, for example: image stitching, stereoscopic vision (3D vision), High-dynamic-range imaging (HDRI), the picture on picture overlays, etc.
The cudamux could be viewed as a generic multiple inputs/single output pads video filter element that executes any custom CUDA algorithm provided by the user. So, this allows the user to develop different CUDA algorithms at the same time and test them using the same cudamux element, by just changes the element's property that specifies the CUDA library that should be loaded during pipeline execution.
Key features
- Multiple inputs/single output pads filter element topology.
- Dynamically load of an external compiled CUDA library that contains the CUDA algorithm to be executed in the GPU to process the incoming frames.
- Independence between the GStreamer element and CUDA algorithm.
- Generic GStreamer element that could execute custom CUDA algorithms.
- Adaptability to many project requirements.
- Ideal for quick prototyping and reducing time to market of project development.
- High performance, due to zero memory copies interface between CUDA and GStreamer.
- Directly handle of NVMM memory type buffers.
Documentation
Element inspect
$ gst-inspect-1.0 cudamux Factory Details: Rank none (0) Long-name cudamux Klass Muxer Description Allows frames to be processed by the GPU using a custom CUDA library algorithm. Multiple input single output topology filter element. Author Diego Chaverri <diego.chaverri@ridgerun.com> Daniel Garbanzo <daniel.garbanzo@ridgerun.com> Enrique Ramirez <enrique.ramirez@ridgerun.com> Michael Gruner <michael.gruner@ridgerun.com> Plugin Details: Name cuda Description Allows frames to be processed by the GPU using a custom CUDA library algorithm Filename /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstcuda.so Version 0.3.1.1 License Proprietary Source module gst-cuda Source release date 2018-01-10 17:43 (UTC) Binary package GStreamer CUDA Plug-in Origin URL Unknown package origin GObject +----GInitiallyUnowned +----GstObject +----GstElement +----GstAggregator +----GstCudaBaseMiso +----GstCudaMux Pad Templates: SINK template: 'sink_%u' Availability: On request Has request_new_pad() function: gst_aggregator_request_new_pad Capabilities: video/x-raw(memory:NVMM) format: I420 width: [ 1, 2147483647 ] height: [ 1, 2147483647 ] framerate: [ 0/1, 2147483647/1 ] SRC template: 'src' Availability: Always Capabilities: video/x-raw format: I420 width: [ 1, 2147483647 ] height: [ 1, 2147483647 ] framerate: [ 0/1, 2147483647/1 ] video/x-raw(memory:NVMM) format: I420 width: [ 1, 2147483647 ] height: [ 1, 2147483647 ] framerate: [ 0/1, 2147483647/1 ] Element Flags: no flags set Element Implementation: Has change_state() function: gst_aggregator_change_state Element has no clocking capabilities. Element has no URI handling capabilities. Pads: SRC: 'src' Pad Template: 'src' Element Properties: name : The name of the object flags: readable, writable String. Default: "cudamux0" parent : The parent of the object flags: readable, writable Object of type "GstObject" latency : Additional latency in live mode to allow upstream to take longer to produce buffers for the current position (in nanoseconds) flags: readable, writable Integer64. Range: 0 - 9223372036854775807 Default: 0 start-time-selection: Decides which start time is output flags: readable, writable Enum "GstAggregatorStartTimeSelection" Default: 0, "zero" (0): zero - Start at 0 running time (default) (1): first - Start at first observed input running time (2): set - Set start time with start-time property start-time : Start time to use if start-time-selection=set flags: readable, writable Unsigned Integer64. Range: 0 - 18446744073709551615 Default: 18446744073709551615 location : Location of the CUDA algorithm library to load flags: readable, writable String. Default: null in-place : Use in-place transform mode configuration flags: readable, writable Boolean. Default: false