# NVIDIA Jetson Xavier - Using CUDA

## Build All CUDA Samples

1. Go to the samples path

cd /usr/local/cuda/samples

2. Construct the samples using the makefile

sudo make

## CUDA Samples

All the samples are in:

/usr/local/cuda/samples

### Simple Samples

Path | Sample | Description |
---|---|---|

/0_Simple/asyncAPI | asyncAPI | This sample uses CUDA streams and events to overlap execution on CPU and GPU. |

/0_Simple/cdpSimplePrint | cdpSimplePrint | This sample demonstrates simple printf implemented using CUDA Dynamic Parallelism. This sample requires devices with compute capability 3.5 or higher. |

/0_Simple/cdpSimpleQuicksort | cdpSimpleQuicksort | This sample demonstrates simple quicksort implemented using CUDA Dynamic Parallelism. This sample requires devices with compute capability 3.5 or higher. |

/0_Simple/clock | clock | This example shows how to use the clock function to measure the performance of a block of threads of a kernel accurately. |

/0_Simple/cppIntegration | cppIntegration | This example demonstrates how to integrate CUDA into an existing C++ application, i.e. the CUDA entry point on the host side is only a function which is called from C++ code, and only the file containing this function is compiled with nvcc. It also demonstrates that vector types can be used from cpp. |

/0_Simple/cppOverload | cppOverload | This sample demonstrates how to use C++ function overloading on the GPU. |

/0_Simple/cudaOpenMP | cudaOpenMP | This sample demonstrates how to use OpenMP API to write an application for multiple GPUs. |

/0_Simple/fp16ScalarProduct | fp16ScalarProduct | Calculates scalar product of two vectors of FP16 numbers. |

/0_Simple/inlinePTX | inlinePTX | A simple test application that demonstrates a new CUDA 4.0 ability to embed PTX in a CUDA kernel. |

/0_Simple/matrixMul | matrixMul | This sample implements matrix multiplication which makes use of shared memory to ensure data reuse, the matrix multiplication is done using the tiling approach. |

/0_Simple/matrixMulCUBLAS | matrixMulCUBLAS | This sample implements matrix multiplication. To illustrate GPU performance for matrix multiply, this sample also shows how to use the new CUDA 4.0 interface for CUBLAS to demonstrate high-performance performance for matrix multiplication. |

/0_Simple/matrixMulDrv | matrixMulDrv | This sample implements matrix multiplication and uses the new CUDA 4.0 kernel launch Driver API. |

/0_Simple/simpleAssert | simpleAssert | This CUDA Runtime API sample is a very basic sample that implements how to use the assert function in the device code. Requires Compute Capability 2.0. |

/0_Simple/simpleAtomicIntrinsics | simpleAtomicIntrinsics | A simple demonstration of global memory atomic instructions. Requires Compute Capability 2.0 or higher. |

/0_Simple/simpleCallback | simpleCallback | This sample implements multi-threaded heterogeneous computing workloads with the new CPU callbacks for CUDA streams and events introduced with CUDA 5.0. |

/0_Simple/simpleCooperativeGroups | simpleCooperativeGroups | This sample is a simple code that illustrates the basic usage of cooperative groups within the thread block. |

/0_Simple/simpleCubemapTexture | simpleCubemapTexture | Simple example that demonstrates how to use a new CUDA 4.1 feature to support cubemap Textures in CUDA C. |

/0_Simple/simpleCudaGraphs | simpleCudaGraphs | A demonstration of CUDA Graphs creation, instantiation, and launch using Graphs APIs and Stream Capture APIs. |

/0_Simple/simpleLayeredTexture | simpleLayeredTexture | Simple example that demonstrates how to use a new CUDA 4.0 feature to support layered Textures in CUDA C. |

/0_Simple/simpleMPI | simpleMPI | Simple example demonstrating how to use MPI in combination with CUDA. |

/0_Simple/simpleMultiCopy | simpleMultiCopy | This sample illustrates the usage of CUDA streams to achieve overlapping of kernel execution with data copies to and from the device. |

/0_Simple/simpleMultiGPU | simpleMultiGPU | This application demonstrates how to use the new CUDA 4.0 API for CUDA context management and multi-threaded access to run CUDA kernels on multiple-GPUs. |

/0_Simple/simpleOccupancy | simpleOccupancy | This sample demonstrates the basic usage of the CUDA occupancy calculator and occupancy-based launch configurator APIs by launching a kernel with the launch configurator and measures the utilization difference against a manually configured launch. |

/0_Simple/simplePitchLinearTexture | simplePitchLinearTexture | Use of Pitch Linear Textures |

/0_Simple/simplePrintf | simplePrintf | This CUDA Runtime API sample is a very basic sample that implements how to use the printf function in the device code. |

/0_Simple/simpleSeparateCompilation | simpleSeparateCompilation | This sample demonstrates a CUDA 5.0 feature, the ability to create a GPU device static library and use it within another CUDA kernel. This example demonstrates how to pass in a GPU device function (from the GPU device static library) as a function pointer to be called. |

/0_Simple/simpleStreams | simpleStreams | This sample uses CUDA streams to overlap kernel executions with memory copies between the host and a GPU device. |

/0_Simple/simpleSurfaceWrite | simpleSurfaceWrite | Simple example that demonstrates the use of 2D surface references (Write-to-Texture). |

/0_Simple/simpleTemplates | simpleTemplates | This sample is a templatized version of the template project. It also shows how to correctly templatize dynamically allocated shared memory arrays. |

/0_Simple/simpleTexture | simpleTexture | Simple example that demonstrates use of Textures in CUDA. |

/0_Simple/simpleTextureDrv | simpleTextureDrv | Simple example that demonstrates the use of Textures in CUDA. This sample uses the new CUDA 4.0 kernel launch Driver API. |

/0_Simple/simpleVoteIntrinsics | simpleVoteIntrinsics | Simple program which demonstrates how to use the Vote (any, all) intrinsic instruction in a CUDA kernel. |

/0_Simple/simpleZeroCopy | simpleZeroCopy | This sample illustrates how to use Zero MemCopy, kernels can read and write directly to pinned system memory |

/0_Simple/template | template | A trivial template project that can be used as a starting point to create new CUDA projects. |

/0_Simple/UnifiedMemoryStreams | UnifiedMemoryStreams | This sample demonstrates the use of OpenMP and streams with Unified Memory on a single GPU. |

/0_Simple/vectorAdd | vectorAdd | This CUDA Runtime API sample is a very basic sample that implements element by element vector addition. |

/0_Simple/vectorAddDrv | vectorAddDrv | This Vector Addition sample is a basic sample that is implemented element by element. |

### Utilities Samples

Path | Sample | Description |
---|---|---|

/1_Utilities/bandwidthTest | bandwidthTest | This is a simple test program to measure the memcopy bandwidth of the GPU and memcpy bandwidth across PCI-e. |

/1_Utilities/deviceQuery | deviceQuery | This sample enumerates the properties of the CUDA devices present in the system. |

/1_Utilities/deviceQueryDrv | deviceQueryDrv | This sample enumerates the properties of the CUDA devices present using CUDA Driver API calls. |

/1_Utilities/p2pBandwidthLatencyTest | p2pBandwidthLatencyTest | This application demonstrates the CUDA Peer-To-Peer (P2P) data transfers between pairs of GPUs and computes latency and bandwidth. |

/1_Utilities/UnifiedMemoryPerf | UnifiedMemoryPerf | This sample demonstrates the performance comparison using matrix multiplication kernel of Unified Memory with/without hints and other types of memory like zero-copy buffers, pageable, page locked memory performing synchronous and Asynchronous transfers on a single GPU. |

### Graphics Samples

Path | Sample | Description |
---|---|---|

/2_Graphics/bindlessTexture | bindlessTexture | This example demonstrates use of cudaSurfaceObject, cudaTextureObject, and MipMap support in CUDA. |

/2_Graphics/Mandelbrot | Mandelbrot | This sample uses CUDA to compute and display the Mandelbrot or Julia sets interactively. It also illustrates the use of "double single" arithmetic to improve precision when zooming a long way into the pattern. |

/2_Graphics/marchingCubes | marchingCubes | This sample extracts a geometric isosurface from a volume dataset using the marching cubes algorithm. It uses the scan (prefix sum) function from the Thrust library to perform stream compaction. |

/2_Graphics/simpleGL | simpleGL | Simple program which demonstrates interoperability between CUDA and OpenGL. The program modifies vertex positions with CUDA and uses OpenGL to render the geometry. |

/2_Graphics/simpleGLES | simpleGLES | Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry. |

/2_Graphics/simpleGLES_EGLOutput | simpleGLES_EGLOutput | Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry, and shows how to render directly to the display using the EGLOutput mechanism and the DRM library. |

/2_Graphics/simpleTexture3D | simpleTexture3D | Simple example that demonstrates use of 3D Textures in CUDA. |

/2_Graphics/volumeFiltering | volumeFiltering | This sample demonstrates 3D Volumetric Filtering using 3D Textures and 3D Surface Writes. |

/2_Graphics/volumeRender | volumeRender | This sample demonstrates basic volume rendering using 3D Textures. |

### Imaging Samples

Path | Sample | Description |
---|---|---|

/3_Imaging/bicubicTexture | bicubicTexture | This sample demonstrates how to efficiently implement a Bicubic B-spline interpolation filter with CUDA texture. |

/3_Imaging/bilateralFilter | bilateralFilter | Bilateral filter is an edge-preserving non-linear smoothing filter that is implemented with CUDA with OpenGL rendering. It can be used in image recovery and denoising. Each pixel is weight by considering both the spatial distance and color distance between its neighbors. |

/3_Imaging/boxFilter | boxFilter | Fast image box filter using CUDA with OpenGL rendering. |

/3_Imaging/convolutionFFT2D | convolutionFFT2D | This sample demonstrates how 2D convolutions with very large kernel sizes can be efficiently implemented using FFT transformations. |

/3_Imaging/convolutionSeparable | convolutionSeparable | This sample implements a separable convolution filter of a 2D signal with a gaussian kernel. |

/3_Imaging/convolutionTexture | convolutionTexture | Texture-based implementation of a separable 2D convolution with a gaussian kernel. |

/3_Imaging/dct8x8 | dct8x8 | This sample demonstrates how Discrete Cosine Transform (DCT) for blocks of 8 by 8 pixels can be performed using CUDA: a naive implementation by definition and a more traditional approach used in many libraries. |

/3_Imaging/dwtHaar1D | dwtHaar1D | Discrete Haar wavelet decomposition for 1D signals with a length which is a power of 2. |

/3_Imaging/dxtc | dxtc | High-Quality DXT Compression using CUDA. This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement. |

/3_Imaging/EGLStream_CUDA_CrossGPU | EGLStream_CUDA_CrossGPU | Demonstrates CUDA and EGL Streams interop, where consumer's EGL Stream is on one GPU and producer's on other and both consumer-producer are different processes. |

/3_Imaging/EGLStreams_CUDA_Interop | EGLStreams_CUDA_Interop | Demonstrates data exchange between CUDA and EGL Streams. |

/3_Imaging/EGLSync_CUDAEvent_Interop | EGLSync_CUDAEvent_Interop | Demonstrates interoperability between CUDA Event and EGL Sync/EGL Image using which one can achieve synchronization on GPU itself for GL-EGL-CUDA operations instead of blocking CPU for synchronization. |

/3_Imaging/histogram | histogram | This sample demonstrates the efficient implementation of 64-bin and 256-bin histograms. |

/3_Imaging/HSOpticalFlow | HSOpticalFlow | Variational optical flow estimation example. Uses textures for image operations. Shows how a simple PDE solver can be accelerated with CUDA. |

/3_Imaging/imageDenoising | imageDenoising | This sample demonstrates two adaptive image denoising techniques: KNN and NLM, based on the computation of both geometric and color distance between texels. |

/3_Imaging/postProcessGL | postProcessGL | This sample shows how to post-process an image rendered in OpenGL using CUDA. |

/3_Imaging/recursiveGaussian | recursiveGaussian | This sample implements a Gaussian blur using Deriche's recursive method. |

/3_Imaging/simpleCUDA2GL | simpleCUDA2GL | This sample shows how to copy a CUDA images back to OpenGL using the most efficient methods. |

/3_Imaging/SobelFilter | SobelFilter | This sample implements the Sobel edge detection filter for 8-bit monochrome images. |

/3_Imaging/stereoDisparity | stereoDisparity | A CUDA program that demonstrates how to compute a stereo disparity map using SIMD SAD (Sum of Absolute Difference) intrinsics. |

### Finance Samples

Path | Sample | Description |
---|---|---|

/4_Finance/binomialOptions | binomialOptions | This sample evaluates fair call price for a given set of European options under the binomial model. |

/4_Finance/BlackScholes | BlackScholes | This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula. |

/4_Finance/MonteCarloMultiGPU | MonteCarloMultiGPU | This sample evaluates fair call price for a given set of European options using the Monte Carlo approach, taking advantage of all CUDA-capable GPUs installed in the system. |

/4_Finance/quasirandomGenerator | quasirandomGenerator | This sample implements Niederreiter Quasirandom Sequence Generator and Inverse Cumulative Normal Distribution functions for the generation of Standard Normal Distributions. |

/4_Finance/SobolQRNG | SobolQRNG | This sample implements Sobol Quasirandom Sequence Generator. |

### Simulations Samples

Path | Sample | Description |
---|---|---|

/5_Simulations/fluidsGL | fluidsGL | An example of fluid simulation using CUDA and CUFFT, with OpenGL rendering. |

/5_Simulations/fluidsGLES | fluidsGLES | An example of fluid simulation using CUDA and CUFFT, with OpenGLES rendering. |

/5_Simulations/nbody | nbody | This sample demonstrates the efficient all-pairs simulation of a gravitational n-body simulation in CUDA. |

/5_Simulations/nbody_opengles | nbody_opengles | This sample demonstrates the efficient all-pairs simulation of a gravitational n-body simulation in CUDA. Unlike the OpenGL nbody sample, there is no user interaction. |

/5_Simulations/oceanFFT | oceanFFT | This sample simulates an Ocean height field using CUFFT Library and renders the result using OpenGL. |

/5_Simulations/particles | particles | This sample uses CUDA to simulate and visualize a large set of particles and their physical interaction. Adding "-particles=<N>" to the command line will allow users to set # of particles for simulation. |

/5_Simulations/smokeParticles | smokeParticles | Smoke simulation with volumetric shadows using half-angle slicing technique. |

### Advanced Samples

Path | Sample | Description |
---|---|---|

/6_Advanced/alignedTypes | alignedTypes | A simple test, showing huge access speed gap between aligned and misaligned structures. |

/6_Advanced/cdpAdvancedQuicksort | cdpAdvancedQuicksort | This sample demonstrates an advanced quicksort implemented using CUDA Dynamic Parallelism. |

/6_Advanced/cdpBezierTessellation | cdpBezierTessellation | This sample demonstrates bezier tessellation of lines implemented using CUDA Dynamic Parallelism. |

/6_Advanced/cdpQuadtree | cdpQuadtree | This sample demonstrates Quad Trees implemented using CUDA Dynamic Parallelism. |

/6_Advanced/concurrentKernels | concurrentKernels | This sample demonstrates the use of CUDA streams for concurrent execution of several kernels on devices of computing capability 2.0 or higher. Devices of computing capability 1.x will run the kernels sequentially. |

/6_Advanced/eigenvalues | eigenvalues | This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a tridiagonal symmetric matrix of arbitrary size with CUDA. |

/6_Advanced/fastWalshTransform | fastWalshTransform | Naturally(Hadamard)-ordered Fast Walsh Transform for batching vectors of arbitrary eligible lengths that are the power of two in size. |

/6_Advanced/FDTD3d | FDTD3d | This sample applies a finite differences time domain progression stencil on a 3D surface. |

/6_Advanced/FunctionPointers | FunctionPointers | This sample illustrates how to use function pointers and implements the Sobel Edge Detection filter for 8-bit monochrome images. |

/6_Advanced/interval | interval | Interval arithmetic operators example. |

/6_Advanced/lineOfSight | lineOfSight | This sample is an implementation of a simple line-of-sight algorithm: Given a height map and a ray originating at some observation point, it computes all the points along the ray that are visible from the observation point. |

/6_Advanced/matrixMulDynlinkJIT | matrixMulDynlinkJIT | This sample revisits matrix multiplication using the CUDA driver API. It demonstrates how to link to CUDA driver at runtime and how to use JIT (just-in-time) compilation from PTX code. |

/6_Advanced/mergeSort | mergeSort | This sample implements a merge sort (also known as Batcher's sort), algorithms belonging to the class of sorting networks. |

/6_Advanced/newdelete | newdelete | This sample demonstrates dynamic global memory allocation through device C++ new and delete operators and virtual function declarations available with CUDA 4.0. |

/6_Advanced/ptxjit | ptxjit | This sample uses the Driver API to just-in-time compile (JIT) a Kernel from PTX code. Additionally, this sample demonstrates the seamless interoperability capability of the CUDA Runtime and CUDA Driver API calls. |

/6_Advanced/radixSortThrust | radixSortThrust | This sample demonstrates a very fast and efficient parallel radix sort that uses the Thrust library. The included RadixSort class can sort either key-value pairs (with a float or unsigned integer keys) or keys only. |

/6_Advanced/reduction | reduction | A parallel sum reduction that computes the sum of a large array of values. |

/6_Advanced/scalarProd | scalarProd | This sample calculates scalar products of a given set of input vector pairs. |

/6_Advanced/scan | scan | This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array. |

/6_Advanced/segmentationTreeThrust | segmentationTreeThrust | This sample demonstrates an approach to the image segmentation trees construction. This method is based on Boruvka's MST algorithm. |

/6_Advanced/shfl_scan | shfl_scan | This example demonstrates how to use the shuffle intrinsic __shfl_up to perform a scan operation across a thread block. |

/6_Advanced/simpleHyperQ | simpleHyperQ | This sample demonstrates the use of CUDA streams for concurrent execution of several kernels on devices that provide HyperQ (SM 3.5). Devices without HyperQ (SM 2.0 and SM 3.0) will run a maximum of two kernels concurrently. |

/6_Advanced/sortingNetworks | sortingNetworks | This sample implements bitonic sort and odd-even merge sort (also known as Batcher's sort), algorithms belonging to the class of sorting networks. While generally subefficient, for large sequences compared to algorithms with better asymptotic algorithmic complexity (i.e. merge sort or radix sort). |

/6_Advanced/threadFenceReduction | threadFenceReduction | This sample shows how to perform a reduction operation on an array of values using the thread Fence intrinsic to produce a single value in a single kernel. |

/6_Advanced/threadMigration | threadMigration | Simple program illustrating how to the CUDA Context Management API and uses the new CUDA 4.0 parameter passing and CUDA launch API. CUDA contexts can be created separately and attached independently to different threads. |

/6_Advanced/transpose | transpose | This sample demonstrates Matrix Transpose. |

/6_Advanced/warpAggregatedAtomicsCG | warpAggregatedAtomicsCG | This sample demonstrates how using Cooperative Groups (CG) to perform warp aggregated atomics, a useful technique to improve performance when many threads atomically add to a single counter. |

### CUDALibraries Samples

Path | Sample | Description |
---|---|---|

/7_CUDALibraries/batchCUBLAS | batchCUBLAS | A CUDA Sample that demonstrates how using batched CUBLAS API calls to improve overall performance. |

/7_CUDALibraries/BiCGStab | BiCGStab | A CUDA Sample that demonstrates Bi-Conjugate Gradient Stabilized (BiCGStab) iterative method for nonsymmetric and symmetric positive definite (s.p.d.) linear systems using CUSPARSE and CUBLAS. |

/7_CUDALibraries/boundSegmentsNPP | boundSegmentsNPP | An NPP CUDA Sample that demonstrates using nppiLabelMarkers to generate connected region segment labels in an 8-bit grayscale image then compressing the sparse list of generated labels into the minimum number of uniquely labeled regions in the image using nppiCompressMarkerLabels. Finally, a boundary is added surrounding each segmented region in the image using nppiBoundSegments. |

/7_CUDALibraries/boxFilterNPP | boxFilterNPP | A NPP CUDA Sample that demonstrates how to use NPP FilterBox function to perform a Box Filter. |

/7_CUDALibraries/cannyEdgeDetectorNPP | cannyEdgeDetectorNPP | An NPP CUDA Sample that demonstrates the recommended parameters to use with the nppiFilterCannyBorder_8u_C1R Canny Edge Detection image filter function. |

/7_CUDALibraries/conjugateGradient | conjugateGradient | This sample implements a conjugate gradient solver on GPU using CUBLAS and CUSPARSE library. |

/7_CUDALibraries/cuSolverDn_LinearSolver | cuSolverDn_LinearSolver | A CUDA Sample that demonstrates cuSolverDN's LU, QR, and Cholesky factorization. |

/7_CUDALibraries/cuSolverRf | cuSolverRf | A CUDA Sample that demonstrates cuSolver's refactorization library - CUSOLVERRF. |

/7_CUDALibraries/cuSolverSp_LinearSolver | cuSolverSp_LinearSolver | A CUDA Sample that demonstrates cuSolverSP's LU, QR, and Cholesky factorization. |

/7_CUDALibraries/cuSolverSp_LowlevelCholesky | cuSolverSp_LowlevelCholesky | A CUDA Sample that demonstrates Cholesky factorization using cuSolverSP's low-level APIs. |

/7_CUDALibraries/cuSolverSp_LowlevelQR | cuSolverSp_LowlevelQR | A CUDA Sample that demonstrates QR factorization using cuSolverSP's low-level APIs. |

/7_CUDALibraries/FilterBorderControlNPP | FilterBorderControlNPP | This NPP CUDA Sample demonstrates how any border version of an NPP filtering function can be used in the most common mode (with border control enabled), can be used to duplicate the results of the equivalent non-border version of the NPP function, and can be used to enable and disable border control on various source image edges depending on what portion of the source image is being used as input. |

/7_CUDALibraries/freeImageInteropNPP | freeImageInteropNPP | A simple CUDA Sample demonstrate how to use FreeImage library with NPP. |

/7_CUDALibraries/histEqualizationNPP | histEqualizationNPP | This CUDA Sample demonstrates how to use NPP for histogram equalization for image data. |

/7_CUDALibraries/jpegNPP | jpegNPP | This sample demonstrates a simple image processing pipeline. First, a JPEG file is Huffman decoded and inverse DCT transformed and dequantized. Then the different plances are resized. Finally, the resized image is quantized, forward DCT transformed and Huffman encoded. |

/7_CUDALibraries/MC_EstimatePiInlineP | MC_EstimatePiInlineP | This sample uses Monte Carlo simulation for Estimation of Pi (using inline PRNG). This sample also uses the NVIDIA CURAND library. |

/7_CUDALibraries/MC_EstimatePiInlineQ | MC_EstimatePiInlineQ | This sample uses Monte Carlo simulation for Estimation of Pi (using inline QRNG). This sample also uses the NVIDIA CURAND library. |

/7_CUDALibraries/MC_EstimatePiP | MC_EstimatePiP | This sample uses Monte Carlo simulation for Estimation of Pi (using batch PRNG). This sample also uses the NVIDIA CURAND library. |

/7_CUDALibraries/MC_EstimatePiQ | MC_EstimatePiQ | This sample uses Monte Carlo simulation for Estimation of Pi (using batch QRNG). This sample also uses the NVIDIA CURAND library. |

/7_CUDALibraries/MC_SingleAsianOptionP | MC_SingleAsianOptionP | This sample uses Monte Carlo to simulate Single Asian Options using the NVIDIA CURAND library. |

/7_CUDALibraries/MersenneTwisterGP11213 | MersenneTwisterGP11213 | This sample demonstrates the Mersenne Twister random number generator GP11213 in cuRAND. |

/7_CUDALibraries/randomFog | randomFog | This sample illustrates pseudo- and quasi- random numbers produced by CURAND. |

/7_CUDALibraries/simpleCUBLAS | simpleCUBLAS | Example of using CUBLAS using the new CUBLAS API interface available in CUDA 4.0. |

/7_CUDALibraries/simpleCUBLASXT | simpleCUBLASXT | Example of using CUBLAS-XT library. |

/7_CUDALibraries/simpleCUFFT | simpleCUFFT | Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into the frequency domain, multiplying them together, and transforming the signal back to the time domain. |

/7_CUDALibraries/simpleCUFFT_2d_MGPU | simpleCUFFT_2d_MGPU | Example of using CUFFT. In this example, CUFFT is used to compute the 2D-convolution of some signal with some filter by transforming both into the frequency domain, multiplying them together, and transforming the signal back to the time domain on Multiple GPU. |

/7_CUDALibraries/simpleCUFFT_MGPU | simpleCUFFT_MGPU | Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into the frequency domain, multiplying them together, and transforming the signal back to the time domain on Multiple GPU. |

For more information about CUDA, go to: Xavier/JetPack_4.1/Components/Cuda