GstCUDA - cudamux

From RidgeRun Developer Wiki


Previous: cudafilter Index Next: cudademux





This page describes in detail the cudamux element of the GstCUDA plugin.


Note
This element is under development. It will be ready for the next GstCUDA version release.


Description

Cudamux is a multiple inputs/single output pad video filter GStreamer element, that allows video frames to be processed by the GPU using a custom CUDA library algorithm. With this element users can now develop their own CUDA processing library, pass the library into cudamux, which executes the library on the GPU, passing upstream frames from the GStreamer pipeline for each input pad to the GPU and passing the modified frames downstream to the next element in the GStreamer pipeline.


This element executes the CUDA algorithm from a custom CUDA library (XXX.so file) loaded dynamically during run-time, passed trough an element's property. The CUDA algorithm is separated from the GStreamer element, so the developer could make modifications to the CUDA algorithm, recompile the custom CUDA library and run the GStreamer pipeline again to test the changes. This process can be iterated as many times as needed to debug a custom CUDA algorithm. This feature makes cudamux ideal for quick prototyping because it offers flexibility and adaptability to many project requirements.


One key feature of this element is the capability to load the CUDA algorithm to be executed on the GPU to process the incoming frames, from an external compiled custom CUDA library. This gives the advantage of having the GStreamer element separated from the CUDA algorithm. So, the developer doesn't have to worry about the GStreamer-CUDA interface and complex memory handling, because the cudamux will take care of that. Instead, the developer can be focused on the custom CUDA algorithm development, and test any change made during the debugging process by just recompiling the CUDA library and just execute the GStreamer pipeline again without the necessity to modify, recompile and reinstall the GstCUDA plugin. This feature is crucial in reducing the time to market on project development because considerably accelerates the prototyping stage.


Another crucial feature of cudamux is the multiple input/single output pads filter element topology. This feature makes this element very flexible and adaptable to many project requirements. This element has one "Always" source pad and multiple "On request" sink pads. The user is responsible to request the number of sink pads as many inputs are required by the custom CUDA algorithm. Because this is quick prototyping intend element, it will not be aware of errors committed by the user related to a mismatch in the number of requested sink pads and the number of inputs required by the custom CUDA algorithm. The cudamux element will generate an array of inputs based on the number of "On requested" sink pads and pass it to the custom CUDA algorithm, accordingly to the expected template of the custom CUDA library. So, for this reason is very important that the user be aware to match the number of requested sink pads with the number of inputs defined in the custom CUDA library to avoid an error.


The cudamux with its multiple inputs/single-output (MISO) topology, becomes the best option for quick prototyping projects that wants to interface GStreamer with a CUDA algorithm that requires several inputs and one output, for example: image stitching, stereoscopic vision (3D vision), High-dynamic-range imaging (HDRI), the picture on picture overlays, etc.


The cudamux could be viewed as a generic multiple inputs/single output pads video filter element that executes any custom CUDA algorithm provided by the user. So, this allows the user to develop different CUDA algorithms at the same time and test them using the same cudamux element, by just changes the element's property that specifies the CUDA library that should be loaded during pipeline execution.


Key features

  • Multiple inputs/single output pads filter element topology.
  • Dynamically load of an external compiled CUDA library that contains the CUDA algorithm to be executed in the GPU to process the incoming frames.
  • Independence between the GStreamer element and CUDA algorithm.
  • Generic GStreamer element that could execute custom CUDA algorithms.
  • Adaptability to many project requirements.
  • Ideal for quick prototyping and reducing time to market of project development.
  • High performance, due to zero memory copies interface between CUDA and GStreamer.
  • Directly handle of NVMM memory type buffers.


Documentation

Cudamux documentation.


Element inspect

$ gst-inspect-1.0 cudamux
Factory Details:
  Rank                     none (0)
  Long-name                cudamux
  Klass                    Muxer
  Description              Allows frames to be processed by the GPU using a custom CUDA library algorithm.
			   Multiple input single output topology filter element.
  Author                   Diego Chaverri <diego.chaverri@ridgerun.com> 
			   Daniel Garbanzo <daniel.garbanzo@ridgerun.com> 
			   Enrique Ramirez <enrique.ramirez@ridgerun.com> 
			   Michael Gruner <michael.gruner@ridgerun.com>

Plugin Details:
  Name                     cuda
  Description              Allows frames to be processed by the GPU using a custom CUDA library algorithm
  Filename                 /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstcuda.so
  Version                  0.3.1.1
  License                  Proprietary
  Source module            gst-cuda
  Source release date      2018-01-10 17:43 (UTC)
  Binary package           GStreamer CUDA Plug-in
  Origin URL               Unknown package origin

GObject
 +----GInitiallyUnowned
       +----GstObject
             +----GstElement
                   +----GstAggregator
                         +----GstCudaBaseMiso
                               +----GstCudaMux

Pad Templates:
  SINK template: 'sink_%u'
    Availability: On request
      Has request_new_pad() function: gst_aggregator_request_new_pad
    Capabilities:
      video/x-raw(memory:NVMM)
                 format: I420
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
              framerate: [ 0/1, 2147483647/1 ]

  SRC template: 'src'
    Availability: Always
    Capabilities:
      video/x-raw
                 format: I420
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
              framerate: [ 0/1, 2147483647/1 ]
      video/x-raw(memory:NVMM)
                 format: I420
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
              framerate: [ 0/1, 2147483647/1 ]


Element Flags:
  no flags set

Element Implementation:
  Has change_state() function: gst_aggregator_change_state

Element has no clocking capabilities.
Element has no URI handling capabilities.

Pads:
  SRC: 'src'
    Pad Template: 'src'

Element Properties:
  name                : The name of the object
                        flags: readable, writable
                        String. Default: "cudamux0"
  parent              : The parent of the object
                        flags: readable, writable
                        Object of type "GstObject"
  latency             : Additional latency in live mode to allow upstream to take longer to produce buffers for the current position (in nanoseconds)
                        flags: readable, writable
                        Integer64. Range: 0 - 9223372036854775807 Default: 0 
  start-time-selection: Decides which start time is output
                        flags: readable, writable
                        Enum "GstAggregatorStartTimeSelection" Default: 0, "zero"
                           (0): zero             - Start at 0 running time (default)
                           (1): first            - Start at first observed input running time
                           (2): set              - Set start time with start-time property
  start-time          : Start time to use if start-time-selection=set
                        flags: readable, writable
                        Unsigned Integer64. Range: 0 - 18446744073709551615 Default: 18446744073709551615 
  location            : Location of the CUDA algorithm library to load
                        flags: readable, writable
                        String. Default: null
  in-place            : Use in-place transform mode configuration
                        flags: readable, writable
                        Boolean. Default: false


Previous: cudafilter Index Next: cudademux