NVIDIA Jetson Xavier - Deep Learning Accelerator (NVDLA)
NVIDIA’s Deep Learning Accelerator (NVDLA) is a hardware design that accelerates inference operations on convolution neural networks (CNNs). It's an open-source project available under the NVIDIA Open NVDLA License.
Most of the computation on CNNs uses the same mathematical operations and can be grouped on five basic layer types: convolution, activation, pooling, normalization, and fully-connected. These operations have extremely predictable memory access patterns and thus can be highly accelerated with application-specific hardware that exploits those patterns.
NVDLA hardware provides a simple, flexible, robust inference acceleration solution. More information about the project can be found on NVDLA's official homepage.
Components
Cores implemented on the Jetson Xavier are "headless" implementations of the NVDLA, which means that unit-by-unit management of the NVDLA hardware happens on the main system processor. Each NVDLA core has the following components:
- Convolution Core – optimized high-performance engine for convolutional layers. Works on two sets of data: constant offline-trained weights, and input feature data. It can map many different sizes of convolutions onto the hardware with high efficiency.
- Single Data Point Processor – single-point lookup engine for activation functions. The Single Data Point Processor (SDP) allows for the application of both linear and nonlinear functions onto individual data points. The SDP implements non-linear functions, such as a sigmoid or a hyperbolic tangent, with a lookup table, as well as common linear functions with bias and scaling operations.
- Planar Data Processor – planar averaging engine for pooling. The Planar Data Processor (PDP) supports specific spatial operations that are common in CNN applications. It is configurable at runtime to support different pool group sizes and supports three pooling functions: maximum-pooling, minimum-pooling, and average-pooling.
- Cross-Channel Data Processor – multi-channel averaging engine for advanced normalization functions. The Cross-channel Data Processor (CDP) is a specialized unit built to apply the local response normalization function, a special normalization function that operates on channel dimensions, as opposed to spatial dimensions.
- Data Reshape Engines – memory-to-memory transformation acceleration for tensor reshape and copy operations. The data reshape engine performs data format transformations (e.g., splitting or slicing, merging, contraction, reshape-transpose).
- Bridge DMA – accelerated path to move data between two non-connected memory systems. The bridge DMA (BDMA) module provides a data copy engine to move data between the system DRAM and the dedicated memory interface.
Software Design
NVDLA software design is grouped into two groups: the compilation tools (model conversion), and the runtime environment (run-time software to load and execute networks on NVDLA). For a detailed description of the software design and an example o how to use the DLA units preset on the Xavier, please refer to the NVDLA Software wiki page.