NVIDIA Xavier - Deep Learning - Deep Learning Accelerator

The NVIDIA Deep Learning Accelerator (NVDLA) is a fixed function engine used to accelerate inference operation on convolution neural networks (CNNs). You will learn about the software available to work with the Deep Learning Accelerator and how use it to create you own projects. If you want to learn more about this engine, its hardware and the open source project NVDLA please refer to the NVDLA Processor wiki page.

NVDLA software design is grouped into two groups: the compilation tools (model conversion), and the runtime environment (run-time software to load and execute networks on NVDLA). The general flow of this is as shown in the figure below; and each of these is described below.

Front view of NVIDIA Jetson Xavier developer kit

Compilation Tools

Compilation tools include compiler, parser and optimizer.

Parser – Reads a pre-trained Caffe model and an intermediate representation to pass to the compiler.
Compiler – creates a sequence of hardware layers optimized for a NVDLA configuration.

These steps can be performed offline on the host device and then deploy the loadable model on Xavier's NVDLAs. Knowing the specific NVDLA hardware configuration enables the compiler to further optimize inference. However, the specific NVDLA configuration on the Tegra Xavier is not yet available.

Runtime Environment

The runtime environment involves running a model on compatible NVDLA hardware. It is effectively divided into two layers:

User Mode Driver – The main interface with user-mode programs. After parsing the neural network compiler compiles network layer by layer and converts it into a file format called NVDLA Loadable. User mode runtime driver loads this loadable and submits inference job to Kernel Mode Driver
Kernel Mode Driver – Consists of drivers and firmware that do the work of scheduling layer operations on NVDLA and programming the NVDLA registers to configure each functional block.

Selecting which DLA to use

You can select which of the two DLAs to use with IBuilder::setDLACore() and IRuntime::setDLACore()

Previous: Deep Learning/TensorRT/Building Examples

Index

Next: Deep Learning/Deep Learning Accelerator/Software Support