GstCUDA - GstCUDA Algorithm Developer’s Guide

This page offers a guide to developing the custom CUDA library that contains the algorithm to be executed by each of the quick prototyping GstCUDA plugin provided elements.

CUDA Algorithm Wrapper Templates

Each quick prototyping element of the GstCUDA plugin has its own wrapper template for the CUDA algorithm library that will be used for that specific element. So, if the developer wants to create a CUDA algorithm library to be used with the cudafilter element (for example), its necessary to follow the CUDA algorithm library wrapper template of the cudafilter element.

Each template shows what does their respective element exposes to the CUDA algorithm, so the developer can use the provided data to process it in the way it wants. These templates allow the developer to don't worry about how to extract the input data and how to push back the output data to the element. The process() and process_ip() functions (who are the ones provided with the purpose to do the CUDA processing), contains input and output parameters, that each element understood how to handle. This is a practical way that allows the developer to focus on the processing CUDA algorithm, and don't have to care about the integration between the CUDA algorithm library and the GstCUDA element. By just using the input parameters on the custom CUDA processing algorithm function and passing the generated output to the output parameters of those processing functions, the elements will automatically handle the input and output parameters.

cudafilter

The CUDA algorithm library wrapper template of the cudafilter element, is located on the following path: "gst-libs/sys/cuda/filteralgorithm.hpp" It requires 4 functions to be implemented:

open()
close()
process (const GstCudaData &input_buffer, GstCudaData &output_buffer)
process_ip (GstCudaData &io_buffer)

Below you will find the content of the filteralgorithm.hpp. On it you will find a detailed description of each required function, that needs to be implemented by the developer of the CUDA algorithm library.

/*
 * Copyright (C) 2017 RidgeRun, LLC (http://www.ridgerun.com)
 * All Rights Reserved.
 *
 * The contents of this software are proprietary and confidential to RidgeRun,
 * LLC.  No part of this program may be photocopied, reproduced or translated
 * into another programming language without prior written consent of
 * RidgeRun, LLC.  The user is free to modify the source code after obtaining
 * a software license from RidgeRun.  All source code changes must be provided
 * back to RidgeRun without any encumbrance.
 */

#ifndef __GST_CUDA_ALGORITHM_FILTER_HPP__
#define __GST_CUDA_ALGORITHM_FILTER_HPP__

#include <gmodule.h>
#include <sys/cuda/gstcuda.h>


namespace Gst
{
  namespace Cuda
  {
    namespace Algorithm
    {
      class Filter
      {
      public:

	/**
	 * open:
	 *
	 * Hook to allow the cuda algorithm initialization routine.
	 *
	 * Returns: true if initialization is successfull, false otherwise.
	 */
	virtual bool open () = 0;

	/**
	 * close:
	 *
	 * Hook to allow the cuda algorithm finalization routine.
	 *
	 * Returns: true if finalization is successfull, false otherwise.
	 */
	virtual bool close () = 0;

	/**
	 * process:
	 * @input_buffer: (in) (transfer none): The input buffer that contains
	 * the data to be processed on the GPU.
	 * @output_buffer: (out) (transfer none): The output buffer that contains
	 * the data processed by the GPU.
	 *
	 * Hook to allow the cuda algorithm process routine. This function
	 * is the responsible for the CUDA algorithm processing execution.
	 * This function will be called when in_place is not configured for
	 * processing the inconming buffers.
	 * In this function the input and output buffers are different,
	 * so the results of the CUDA algorithm processing only will be
	 * reflected on the output_buffer, that means that the
	 * input_buffer remains unmodified.
	 *
	 * Returns: true if process is successfull, false otherwise.
	 */
	virtual bool process (const GstCudaData &input_buffer, GstCudaData &output_buffer) = 0;

	/**
	 * process_ip:
	 *
	 * @io_buffer: (in) (transfer none): The input/output buffer that contains
	 * the data to be processed by the GPU.
	 *
	 * Hook to allow the cuda algorithm proccess_ip (proccess in place)
	 * routine. This function is the responsible for the CUDA algorithm
	 * processing execution.
	 * This function will be called when in_place is configured for
	 * processing the incoming buffers.
	 * In this function the input and output buffer is the same. That
	 * means that the results of the CUDA algorithm processing will directly
	 * modify the input buffer, so its original incoming data will be modified
	 * accordinlgy to the CUDA algorithm.
	 *
	 * Returns: true if process_ip is successfull, false otherwise.
	 */
	virtual bool process_ip (GstCudaData &io_buffer) = 0;

	virtual ~Filter() {};
      };
    };
  };
};

extern "C" {

     /**
      * factory_make:
      *
      * Return a newly allocated algorithm to be used by cudafilter
      *
      * Returns: A newly allocated algorithm to be used by cudafilter
      */
  G_MODULE_EXPORT Gst::Cuda::Algorithm::Filter * factory_make (void);
}

#endif //__GST_CUDA_ALGORITHM_FILTER_HPP_

You can find examples of CUDA algorithm libraries that use this template under the following paths:

$GSTCUDA_DEVDIR/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.cu
$GSTCUDA_DEVDIR/tests/examples/cudafilter_algorithms/memcpy/memcpy.cu

cudamux

The CUDA algorithm library wrapper template of the cudamux element, is located on the following path: "gst-libs/sys/cuda/muxalgorithm.hpp" It requires 4 functions to be implemented:

open()
close()
process (std::vector<GstCudaData> input_buffers, GstCudaData &output_buffer)
process_ip (std::vector<GstCudaData> input_buffers, GstCudaData &output_buffer)

Below you will find the content of the muxalgorithm.hpp. On it you will find a detailed description of each required function, that needs to be implemented by the developer of the CUDA algorithm library.

/*
 * Copyright (C) 2017 RidgeRun, LLC (http://www.ridgerun.com)
 * All Rights Reserved.
 *
 * The contents of this software are proprietary and confidential to RidgeRun,
 * LLC.  No part of this program may be photocopied, reproduced or translated
 * into another programming language without prior written consent of
 * RidgeRun, LLC.  The user is free to modify the source code after obtaining
 * a software license from RidgeRun.  All source code changes must be provided
 * back to RidgeRun without any encumbrance.
 */

#ifndef __GST_CUDA_ALGORITHM_MUX_HPP__
#define __GST_CUDA_ALGORITHM_MUX_HPP__

#include <gmodule.h>
#include <sys/cuda/gstcuda.h>
#include <vector>

namespace Gst
{
  namespace Cuda
  {
    namespace Algorithm
    {
      class Mux
      {
      public:

	/**
	 * open:
	 *
	 * Hook to allow the cuda algorithm initialization routine.
	 *
	 * Returns: true if initialization is successfull, false otherwise.
	 */
	virtual bool open () = 0;

	/**
	 * close:
	 *
	 * Hook to allow the cuda algorithm finalization routine.
	 *
	 * Returns: true if finalization is successfull, false otherwise.
	 */
	virtual bool close () = 0;

	/**
	 * process:
	 * @input_buffers: (in) (transfer none): The input buffers vector that 
	 * contains the data to be processed on the GPU.
	 * @output_buffer: (out) (transfer none): The output buffer that contains
	 * the data processed by the GPU.
	 *
	 * Hook to allow the cuda algorithm process routine. This function
	 * is the responsible for the CUDA algorithm processing execution.
	 * This function will be called when in_place is not configured for
	 * processing the inconming buffers.
	 * In this function the inputs and output buffers are different,
	 * so the results of the CUDA algorithm processing only will be
	 * reflected on the output_buffer, that means that the
	 * input_buffers remains unmodified.
	 *
	 * Returns: true if process is successfull, false otherwise.
	 */
	virtual bool process (std::vector<GstCudaData> input_buffers, GstCudaData &output_buffer) = 0;

	/**
	 * process_ip:
	 *
	 * @input_buffers: (in) (transfer none): The data processed by the GPU. 
	 * @output_buffer: (out) (transfer none): The output buffer that contains
	 * the data processed by the GPU.
	 * This buffer is a reference to the input vector first buffer.
	 *
	 * Hook to allow the cuda algorithm process_ip (proccess in place)
	 * routine. This function is the responsible for the CUDA algorithm
	 * processing execution.
	 * This function will be called when in_place is configured for
	 * processing the incoming buffers.
	 * In this function the first input buffer of the vector and output buffer 
	 * is the same. That means that the results of the CUDA algorithm processing 
	 * will directly modify the input buffer, so its original incoming data will 
	 * be modified accordinlgy to the CUDA algorithm.
	 *
	 * Returns: true if process_ip is successfull, false otherwise.
	 */
	virtual bool process_ip (std::vector<GstCudaData> input_buffers, GstCudaData &output_buffer) = 0;

	virtual ~Mux() {};
      };
    };
  };
};

extern "C" {

     /**
      * factory_make:
      *
      * Return a newly allocated algorithm to be used by cudafilter
      *
      * Returns: A newly allocated algorithm to be used by cudafilter
      */
  G_MODULE_EXPORT Gst::Cuda::Algorithm::Mux * factory_make (void);
}

#endif //__GST_CUDA_ALGORITHM_MUX_HPP_

You can find examples of CUDA algorithm libraries that use this template under the following path:

$GSTCUDA_DEVDIR/tests/examples/cudamux_algorithms/mixer/mixer.cu

cudademux

Under Construction

cudamimo

Under Construction

Compilation Guide

GstCUDA provides a helper makefile that provides the functionality to build a shared library to be loaded dynamically by the cudafilter element. You may check a source code example in the code located in:

 gst-cuda/tools/example

Process

Build gst-cuda
Add a new makefile to your project that includes the helper makefile. E.g.:

# Including GstCuda helper makefile
include /home/mgruner/RidgeRun/gst-cuda/tools/algorithm.mk

Add the sources to your project
Build

make

Debug

You may see the build process by invoking the makefile commands using the V=1 variable. For example:

$ make V=1

======================================
GstCuda helper makefile configuration
-------------------------------------
GST_CUDA_LIB = libgstcuda-1.0.so
GST_CUDA_LIBDIR = /usr/lib/aarch64-linux-gnu
GST_CUDA_INCDIR = /usr/include/gstreamer-1.0

CUDA_NVCC = /usr/local/cuda/bin/nvcc
CUDA_FLAGS = -m64 -O3 -arch=sm_30 -Xcompiler -Wall -Xcompiler -Wextra --ptxas-options=-v -Xcompiler -fPIC -DPIC -D_FORCE_INLINES
CUDA_EXTRA_CFLAGS =

GST_INCDIR = -I/gst-libs -I/usr/include/gstreamer-1.0 -I/usr/lib/aarch64-linux-gnu/gstreamer-1.0/include -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -DGST_USE_UNSTABLE_API -DG_THREADS_MANDATORY -DG_DISABLE_DEPRECATED

CXXFLAGS = -std=c++11 -I/usr/include/gstreamer-1.0 -I/gst-libs -I/usr/include/gstreamer-1.0 -I/usr/lib/aarch64-linux-gnu/gstreamer-1.0/include -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -DGST_USE_UNSTABLE_API -DG_THREADS_MANDATORY -DG_DISABLE_DEPRECATED
EXTRA_CXXFLAGS =
LDFLAGS = --shared -L/usr/lib/aarch64-linux-gnu -lgstcuda-1.0 -Wno-deprecated-gpu-targets
EXTRA_LDFLAGS =

ALGORITHM = example.so
SUFFIX = cu
SOURCES = example.cu
OBJECTS = example.o
======================================

Compiling example.cu
/usr/local/cuda/bin/nvcc -o example.o -c example.cu -m64 -O3 -arch=sm_30 -Xcompiler -Wall -Xcompiler -Wextra --ptxas-options=-v -Xcompiler "-fPIC -DPIC" -D_FORCE_INLINES  -std=c++11 -I/usr/include/gstreamer-1.0 -I/gst-libs -I/usr/include/gstreamer-1.0 -I/usr/lib/aarch64-linux-gnu/gstreamer-1.0/include -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -DGST_USE_UNSTABLE_API -DG_THREADS_MANDATORY -DG_DISABLE_DEPRECATED
Linking example.so from example.o
/usr/local/cuda/bin/nvcc -o example.so example.o --shared -L/usr/lib/aarch64-linux-gnu -lgstcuda-1.0 -Wno-deprecated-gpu-targets

Targets

The helper makefile provides to targets

make
make clean

Customization

Even though the helper makefile should typically be enough for most project requirements, there are some variables that may be adjusted to control the flow of the build.

Define this variables after the makefile inclusion.

GST_CUDA_LIB: Use this variable if an alternative GstCuda library name is used. The default is libgstcuda-1.0.so

GST_CUDA_LIBDIR: Use this variable if the GstCuda library is to be read from a location other than the installed by the project. The default is typically /usr/lib/aarch64-linux-gnu

GST_CUDA_INCDIR: Use this variable if you are reading headers from a location other than the used by the project. The default is typically /usr/include/gstreamer-1.0/sys/cuda/

CUDA_NVCC: Use this variable to select an alternative NVCC than the one found at configure time. The default is tyically /usr/local/cuda-8.0/bin/nvcc

CUDA_FLAGS: Use this variable if you would like to completely override the nvcc configuration found at build time. The default is typically -m64 -O3 -arch=sm_30 -Xcompiler -Wall -Xcompiler -Wextra --ptxas-options=-v -Xcompiler -fPIC -DPIC -D_FORCE_INLINES

CUDA_EXTRA_CFLAGS: Use this variable if you would like to append nvcc compiler flags to the ones provided by CUDA_FLAGS.

GST_INCDIR: Use this variable if you would like to use a different set of GStreamer headers from the ones you used during the GstCuda build. The default is typically -I/gst-libs -I/usr/include/gstreamer-1.0 -I/usr/lib/aarch64-linux-gnu/gstreamer-1.0/include -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include

CXXFLAGS: Use this variable if you would like to completely override the compiler configuration. The default is typically -std=c++11 -I$(GST_CUDA_INCDIR) $(GST_INCDIR) -D_GLIB_TEST_OVERFLOW_FALLBACK

EXTRA_CXXFLAGS: Use this variable if you would like to append compiler flags to the ones provided by CXXFLAGS.

LDFLAGS: Use this variable if you would like to completely override the linker configuration. The default is typically --shared -L$(GST_CUDA_LIBDIR) -lgstcuda-1.0 -Wno-deprecated-gpu-targets

EXTRA_LDFLAGS: Use this variable if you would like to append linker flags to the ones provided by LDFLAGS

ALGORITHM: Use this variable if you would like to use a custom name for the algorithm. By default, the makefile will use the directory name as the name for the shared object

SUFFIX: Use this variable if you are using a file suffix other than "cu". The makefile will compile all the *.$(SUFFIX) in the directory. The default is cu

SOURCES: Uncomment this variable if you would like to pass in a custom list of source files. By default, the makefile will look for all the source files in the current directory.

Previous: cudamimo

Index

Next: GstCUDA Add-Ons