GstCUDA - GstCUDA Algorithm Developer’s Guide

From RidgeRun Developer Wiki



Previous: cudamimo Index Next: GstCUDA Add-Ons




This page offers a guide to developing the custom CUDA library that contains the algorithm to be executed by each of the quick prototyping GstCUDA plugin provided elements.

CUDA Algorithm Wrapper Templates

Each quick prototyping element of the GstCUDA plugin has its own wrapper template for the CUDA algorithm library that will be used for that specific element. So, if the developer wants to create a CUDA algorithm library to be used with the cudafilter element (for example), its necessary to follow the CUDA algorithm library wrapper template of the cudafilter element.

Each template shows what does their respective element exposes to the CUDA algorithm, so the developer can use the provided data to process it in the way it wants. These templates allow the developer to don't worry about how to extract the input data and how to push back the output data to the element. The process() and process_ip() functions (who are the ones provided with the purpose to do the CUDA processing), contains input and output parameters, that each element understood how to handle. This is a practical way that allows the developer to focus on the processing CUDA algorithm, and don't have to care about the integration between the CUDA algorithm library and the GstCUDA element. By just using the input parameters on the custom CUDA processing algorithm function and passing the generated output to the output parameters of those processing functions, the elements will automatically handle the input and output parameters.

cudafilter

The CUDA algorithm library wrapper template of the cudafilter element, is located on the following path: "gst-libs/sys/cuda/filteralgorithm.hpp" It requires 4 functions to be implemented:

  • open()
  • close()
  • process (const GstCudaData &input_buffer, GstCudaData &output_buffer)
  • process_ip (GstCudaData &io_buffer)

Below you will find the content of the filteralgorithm.hpp. On it you will find a detailed description of each required function, that needs to be implemented by the developer of the CUDA algorithm library.

/*
 * Copyright (C) 2017 RidgeRun, LLC (http://www.ridgerun.com)
 * All Rights Reserved.
 *
 * The contents of this software are proprietary and confidential to RidgeRun,
 * LLC.  No part of this program may be photocopied, reproduced or translated
 * into another programming language without prior written consent of
 * RidgeRun, LLC.  The user is free to modify the source code after obtaining
 * a software license from RidgeRun.  All source code changes must be provided
 * back to RidgeRun without any encumbrance.
 */

#ifndef __GST_CUDA_ALGORITHM_FILTER_HPP__
#define __GST_CUDA_ALGORITHM_FILTER_HPP__

#include <gmodule.h>
#include <sys/cuda/gstcuda.h>


namespace Gst
{
  namespace Cuda
  {
    namespace Algorithm
    {
      class Filter
      {
      public:

	/**
	 * open:
	 *
	 * Hook to allow the cuda algorithm initialization routine.
	 *
	 * Returns: true if initialization is successfull, false otherwise.
	 */
	virtual bool open () = 0;

	/**
	 * close:
	 *
	 * Hook to allow the cuda algorithm finalization routine.
	 *
	 * Returns: true if finalization is successfull, false otherwise.
	 */
	virtual bool close () = 0;

	/**
	 * process:
	 * @input_buffer: (in) (transfer none): The input buffer that contains
	 * the data to be processed on the GPU.
	 * @output_buffer: (out) (transfer none): The output buffer that contains
	 * the data processed by the GPU.
	 *
	 * Hook to allow the cuda algorithm process routine. This function
	 * is the responsible for the CUDA algorithm processing execution.
	 * This function will be called when in_place is not configured for
	 * processing the inconming buffers.
	 * In this function the input and output buffers are different,
	 * so the results of the CUDA algorithm processing only will be
	 * reflected on the output_buffer, that means that the
	 * input_buffer remains unmodified.
	 *
	 * Returns: true if process is successfull, false otherwise.
	 */
	virtual bool process (const GstCudaData &input_buffer, GstCudaData &output_buffer) = 0;

	/**
	 * process_ip:
	 *
	 * @io_buffer: (in) (transfer none): The input/output buffer that contains
	 * the data to be processed by the GPU.
	 *
	 * Hook to allow the cuda algorithm proccess_ip (proccess in place)
	 * routine. This function is the responsible for the CUDA algorithm
	 * processing execution.
	 * This function will be called when in_place is configured for
	 * processing the incoming buffers.
	 * In this function the input and output buffer is the same. That
	 * means that the results of the CUDA algorithm processing will directly
	 * modify the input buffer, so its original incoming data will be modified
	 * accordinlgy to the CUDA algorithm.
	 *
	 * Returns: true if process_ip is successfull, false otherwise.
	 */
	virtual bool process_ip (GstCudaData &io_buffer) = 0;

	virtual ~Filter() {};
      };
    };
  };
};

extern "C" {

     /**
      * factory_make:
      *
      * Return a newly allocated algorithm to be used by cudafilter
      *
      * Returns: A newly allocated algorithm to be used by cudafilter
      */
  G_MODULE_EXPORT Gst::Cuda::Algorithm::Filter * factory_make (void);
}

#endif //__GST_CUDA_ALGORITHM_FILTER_HPP_

You can find examples of CUDA algorithm libraries that use this template under the following paths:

  • $GSTCUDA_DEVDIR/tests/examples/cudafilter_algorithms/gray-scale-filter/gray-scale-filter.cu
  • $GSTCUDA_DEVDIR/tests/examples/cudafilter_algorithms/memcpy/memcpy.cu


cudamux

The CUDA algorithm library wrapper template of the cudamux element, is located on the following path: "gst-libs/sys/cuda/muxalgorithm.hpp" It requires 4 functions to be implemented:

  • open()
  • close()
  • process (std::vector<GstCudaData> input_buffers, GstCudaData &output_buffer)
  • process_ip (std::vector<GstCudaData> input_buffers, GstCudaData &output_buffer)

Below you will find the content of the muxalgorithm.hpp. On it you will find a detailed description of each required function, that needs to be implemented by the developer of the CUDA algorithm library.

/*
 * Copyright (C) 2017 RidgeRun, LLC (http://www.ridgerun.com)
 * All Rights Reserved.
 *
 * The contents of this software are proprietary and confidential to RidgeRun,
 * LLC.  No part of this program may be photocopied, reproduced or translated
 * into another programming language without prior written consent of
 * RidgeRun, LLC.  The user is free to modify the source code after obtaining
 * a software license from RidgeRun.  All source code changes must be provided
 * back to RidgeRun without any encumbrance.
 */

#ifndef __GST_CUDA_ALGORITHM_MUX_HPP__
#define __GST_CUDA_ALGORITHM_MUX_HPP__

#include <gmodule.h>
#include <sys/cuda/gstcuda.h>
#include <vector>

namespace Gst
{
  namespace Cuda
  {
    namespace Algorithm
    {
      class Mux
      {
      public:

	/**
	 * open:
	 *
	 * Hook to allow the cuda algorithm initialization routine.
	 *
	 * Returns: true if initialization is successfull, false otherwise.
	 */
	virtual bool open () = 0;

	/**
	 * close:
	 *
	 * Hook to allow the cuda algorithm finalization routine.
	 *
	 * Returns: true if finalization is successfull, false otherwise.
	 */
	virtual bool close () = 0;

	/**
	 * process:
	 * @input_buffers: (in) (transfer none): The input buffers vector that 
	 * contains the data to be processed on the GPU.
	 * @output_buffer: (out) (transfer none): The output buffer that contains
	 * the data processed by the GPU.
	 *
	 * Hook to allow the cuda algorithm process routine. This function
	 * is the responsible for the CUDA algorithm processing execution.
	 * This function will be called when in_place is not configured for
	 * processing the inconming buffers.
	 * In this function the inputs and output buffers are different,
	 * so the results of the CUDA algorithm processing only will be
	 * reflected on the output_buffer, that means that the
	 * input_buffers remains unmodified.
	 *
	 * Returns: true if process is successfull, false otherwise.
	 */
	virtual bool process (std::vector<GstCudaData> input_buffers, GstCudaData &output_buffer) = 0;

	/**
	 * process_ip:
	 *
	 * @input_buffers: (in) (transfer none): The data processed by the GPU. 
	 * @output_buffer: (out) (transfer none): The output buffer that contains
	 * the data processed by the GPU.
	 * This buffer is a reference to the input vector first buffer.
	 *
	 * Hook to allow the cuda algorithm process_ip (proccess in place)
	 * routine. This function is the responsible for the CUDA algorithm
	 * processing execution.
	 * This function will be called when in_place is configured for
	 * processing the incoming buffers.
	 * In this function the first input buffer of the vector and output buffer 
	 * is the same. That means that the results of the CUDA algorithm processing 
	 * will directly modify the input buffer, so its original incoming data will 
	 * be modified accordinlgy to the CUDA algorithm.
	 *
	 * Returns: true if process_ip is successfull, false otherwise.
	 */
	virtual bool process_ip (std::vector<GstCudaData> input_buffers, GstCudaData &output_buffer) = 0;

	virtual ~Mux() {};
      };
    };
  };
};

extern "C" {

     /**
      * factory_make:
      *
      * Return a newly allocated algorithm to be used by cudafilter
      *
      * Returns: A newly allocated algorithm to be used by cudafilter
      */
  G_MODULE_EXPORT Gst::Cuda::Algorithm::Mux * factory_make (void);
}

#endif //__GST_CUDA_ALGORITHM_MUX_HPP_

You can find examples of CUDA algorithm libraries that use this template under the following path:

  • $GSTCUDA_DEVDIR/tests/examples/cudamux_algorithms/mixer/mixer.cu


cudademux

Under Construction

cudamimo

Under Construction


Compilation Guide

GstCUDA provides a helper makefile that provides the functionality to build a shared library to be loaded dynamically by the cudafilter element. You may check a source code example in the code located in:

 gst-cuda/tools/example

Process

  • Build gst-cuda
  • Add a new makefile to your project that includes the helper makefile. E.g.:
# Including GstCuda helper makefile
include /home/mgruner/RidgeRun/gst-cuda/tools/algorithm.mk
  • Add the sources to your project
  • Build
make


Debug

You may see the build process by invoking the makefile commands using the V=1 variable. For example:

$ make V=1

======================================
GstCuda helper makefile configuration
-------------------------------------
GST_CUDA_LIB = libgstcuda-1.0.so
GST_CUDA_LIBDIR = /usr/lib/aarch64-linux-gnu
GST_CUDA_INCDIR = /usr/include/gstreamer-1.0

CUDA_NVCC = /usr/local/cuda/bin/nvcc
CUDA_FLAGS = -m64 -O3 -arch=sm_30 -Xcompiler -Wall -Xcompiler -Wextra --ptxas-options=-v -Xcompiler -fPIC -DPIC -D_FORCE_INLINES
CUDA_EXTRA_CFLAGS =

GST_INCDIR = -I/gst-libs -I/usr/include/gstreamer-1.0 -I/usr/lib/aarch64-linux-gnu/gstreamer-1.0/include -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -DGST_USE_UNSTABLE_API -DG_THREADS_MANDATORY -DG_DISABLE_DEPRECATED

CXXFLAGS = -std=c++11 -I/usr/include/gstreamer-1.0 -I/gst-libs -I/usr/include/gstreamer-1.0 -I/usr/lib/aarch64-linux-gnu/gstreamer-1.0/include -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -DGST_USE_UNSTABLE_API -DG_THREADS_MANDATORY -DG_DISABLE_DEPRECATED
EXTRA_CXXFLAGS =
LDFLAGS = --shared -L/usr/lib/aarch64-linux-gnu -lgstcuda-1.0 -Wno-deprecated-gpu-targets
EXTRA_LDFLAGS =

ALGORITHM = example.so
SUFFIX = cu
SOURCES = example.cu
OBJECTS = example.o
======================================

Compiling example.cu
/usr/local/cuda/bin/nvcc -o example.o -c example.cu -m64 -O3 -arch=sm_30 -Xcompiler -Wall -Xcompiler -Wextra --ptxas-options=-v -Xcompiler "-fPIC -DPIC" -D_FORCE_INLINES  -std=c++11 -I/usr/include/gstreamer-1.0 -I/gst-libs -I/usr/include/gstreamer-1.0 -I/usr/lib/aarch64-linux-gnu/gstreamer-1.0/include -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include -DGST_USE_UNSTABLE_API -DG_THREADS_MANDATORY -DG_DISABLE_DEPRECATED
Linking example.so from example.o
/usr/local/cuda/bin/nvcc -o example.so example.o --shared -L/usr/lib/aarch64-linux-gnu -lgstcuda-1.0 -Wno-deprecated-gpu-targets


Targets

The helper makefile provides to targets

  • make
  • make clean


Customization

Even though the helper makefile should typically be enough for most project requirements, there are some variables that may be adjusted to control the flow of the build.

Define this variables after the makefile inclusion.

GST_CUDA_LIB
Use this variable if an alternative GstCuda library name is used. The default is libgstcuda-1.0.so
GST_CUDA_LIBDIR
Use this variable if the GstCuda library is to be read from a location other than the installed by the project. The default is typically /usr/lib/aarch64-linux-gnu
GST_CUDA_INCDIR
Use this variable if you are reading headers from a location other than the used by the project. The default is typically /usr/include/gstreamer-1.0/sys/cuda/
CUDA_NVCC
Use this variable to select an alternative NVCC than the one found at configure time. The default is tyically /usr/local/cuda-8.0/bin/nvcc
CUDA_FLAGS
Use this variable if you would like to completely override the nvcc configuration found at build time. The default is typically -m64 -O3 -arch=sm_30 -Xcompiler -Wall -Xcompiler -Wextra --ptxas-options=-v -Xcompiler -fPIC -DPIC -D_FORCE_INLINES
CUDA_EXTRA_CFLAGS
Use this variable if you would like to append nvcc compiler flags to the ones provided by CUDA_FLAGS.
GST_INCDIR
Use this variable if you would like to use a different set of GStreamer headers from the ones you used during the GstCuda build. The default is typically -I/gst-libs -I/usr/include/gstreamer-1.0 -I/usr/lib/aarch64-linux-gnu/gstreamer-1.0/include -I/usr/include/glib-2.0 -I/usr/lib/aarch64-linux-gnu/glib-2.0/include
CXXFLAGS
Use this variable if you would like to completely override the compiler configuration. The default is typically -std=c++11 -I$(GST_CUDA_INCDIR) $(GST_INCDIR) -D_GLIB_TEST_OVERFLOW_FALLBACK
EXTRA_CXXFLAGS
Use this variable if you would like to append compiler flags to the ones provided by CXXFLAGS.
LDFLAGS
Use this variable if you would like to completely override the linker configuration. The default is typically --shared -L$(GST_CUDA_LIBDIR) -lgstcuda-1.0 -Wno-deprecated-gpu-targets
EXTRA_LDFLAGS
Use this variable if you would like to append linker flags to the ones provided by LDFLAGS
ALGORITHM
Use this variable if you would like to use a custom name for the algorithm. By default, the makefile will use the directory name as the name for the shared object
SUFFIX
Use this variable if you are using a file suffix other than "cu". The makefile will compile all the *.$(SUFFIX) in the directory. The default is cu
SOURCES
Uncomment this variable if you would like to pass in a custom list of source files. By default, the makefile will look for all the source files in the current directory.


Previous: cudamimo Index Next: GstCUDA Add-Ons