Xavier/Processors/GPU/Description: Difference between revisions

Latest revision as of 17:45, 13 February 2023

The same Volta GPU architecture that powers NVIDIA high-performance computing (HPC) products was adapted for use in Xavier series modules. The Volta architecture features a new Streaming Multiprocessor (SM) optimized for deep learning.

Volta Streaming Multiprocessor

The new Volta SM is far more energy-efficient than the previous generations enabling major performance boosts in the same power envelope. The Volta SM includes:

New programmable Tensor Cores purpose-built for INT8/FP16/FP32 deep learning tensor operations; IMMA and HMMA instructions accelerate integer and mixed-precision matrix-multiply-and-accumulate operations.
Enhanced L1 data cache for higher performance and lower latency.
Streamlined instruction set for simpler decoding and reduced instruction latencies.
Higher clocks and higher power efficiency.

The Volta architecture also incorporates a new generation of its memory subsystem and enhanced unified memory and address translation services that increase memory bandwidth and improves utilization for greater efficiency.

Graphics Processing Cluster

The Graphics Processing Cluster (GPC) is a dedicated hardware block for computing, rasterization, shading, and texturing; most of the GPU’s core graphics functions are performed inside the GPC. It is comprised of four Texture Processing Clusters (TPC), with each TPC containing two SM units, and a Raster Engine. The SM unit creates, manages, schedules, and executes instructions from many threads in parallel. Raster operators (ROPs) continue to be aligned with L2 cache slices and memory controllers. The SM geometry and pixel processing performance make it highly suitable for rendering advanced user interfaces; the efficiency of the Volta GPU enables this performance on devices with power-limited environments.

Each SM is partitioned into four separate processing blocks (referred to as SMPs), each SMP contains its own instruction buffer, scheduler, CUDA cores, and Tensor cores. Inside each SMP, CUDA cores perform pixel/vertex/geometry shading and physics/compute calculations, and each Tensor core provides a 4x4x4 matrix processing array to perform mixed precision fused multiply-add (FMA) mathematical operations. Texture units perform texture filtering and load/store units fetch and save data to memory. Special Function Units (SFUs) handle transcendental and graphics interpolation instructions. Finally, the PolyMorph Engine handles vertex fetch, tessellation, viewport transform, attribute setup, and stream output.

Features

512-core
End-to-end lossless compression.
Tile Caching.
OpenGL 4.6, OpenGL ES 3.2, and Vulkan 1.0.
Adaptive Scalable Texture Compression (ATSC) LDR profile supported.
DirectX 12 compliant.
CUDA support.
Iterated blend, ROP OpenGL-ES blend modes.
2D BLIT from 3D class avoids channel switch.
2D color compression.
Constant color render SM bypass.
2x, 4x, 8x MSAA with color and Z compression.
Non-power-of-2 and 3D textures, FP16 texture filtering.
FP16 shader support.
Geometry and Vertex attribute Instancing.
Parallel pixel processing.
Early-z reject: Fast rejection of occluded pixels acts as a multiplier on pixel shader and texture performance while saving power and bandwidth.
Video protection region.
Power saving: Multiple levels of clock gating for linear scaling of power.

Previous: Processors/GPU

Index

Next: Processors/GPU/CUDA

@@ Line 1: / Line 1: @@
-=Features=
+<noinclude>
-#End-to-end lossless compression.
+{{Xavier/Head|previous=Processors/GPU|next=Processors/GPU/CUDA|metakeywords=processors,volta,volta gpu,Volta SM,Streaming Multiprocessor,Graphics Processing Cluster}}
-#Tile Caching.
+</noinclude>
-#OpenGL 4.6, OpenGL ES 3.2, and Vulkan 1.0.
-#Adaptive Scalable Texture Compression (ATSC) LDR profile supported.
+{{DISPLAYTITLE:NVIDIA Jetson Xavier - Description of the Volta GPU|noerror}}
-#DirectX 12 compliant.
-#CUDA support.
+The same Volta GPU architecture that powers NVIDIA high-performance computing (HPC) products was adapted for use in Xavier series modules. The Volta architecture features a new Streaming Multiprocessor (SM) optimized for deep learning.
-#Iterated blend, ROP OpenGL-ES blend modes.
-#2D BLIT from 3D class avoids channel switch.
+__TOC__
-#2D color compression.
-#Constant color render SM bypass.
+==Volta Streaming Multiprocessor==
-#2x, 4x, 8x MSAA with color and Z compression.
+The new Volta SM is far more energy-efficient than the previous generations enabling major performance boosts in the same power envelope. The Volta SM includes:
-#Non-power-of-2 and 3D textures, FP16 texture filtering.
+#New programmable Tensor Cores purpose-built for INT8/FP16/FP32 deep learning tensor operations; IMMA and HMMA instructions accelerate integer and mixed-precision matrix-multiply-and-accumulate operations.
-#FP16 shader support.
+#Enhanced L1 data cache for higher performance and lower latency.
-#Geometry and Vertex attribute Instancing.
+#Streamlined instruction set for simpler decoding and reduced instruction latencies.
-#Parallel pixel processing.
+#Higher clocks and higher power efficiency.
-#Early-z reject: Fast rejection of occluded pixels acts as multiplier on pixel shader and texture performance while saving power and bandwidth.
+The Volta architecture also incorporates a new generation of its memory subsystem and enhanced unified memory and address translation services that increase memory bandwidth and improves utilization for greater efficiency.
-#Video protection region.
-#Power saving: Multiple levels of clock gating for linear scaling of power.
+==Graphics Processing Cluster==
+The Graphics Processing Cluster (GPC) is a dedicated hardware block for computing, rasterization, shading, and texturing; most of the GPU’s core graphics functions are performed inside the GPC. It is comprised of four Texture Processing Clusters (TPC), with each TPC containing two SM units, and a Raster Engine. The SM unit creates, manages, schedules, and executes instructions from many threads in parallel. Raster operators (ROPs) continue to be aligned with L2 cache slices and memory controllers. The SM geometry and pixel processing performance make it highly suitable for rendering advanced user interfaces; the efficiency of the Volta GPU enables this performance on devices with power-limited environments.
+Each SM is partitioned into four separate processing blocks (referred to as SMPs), each SMP contains its own instruction buffer, scheduler, CUDA cores, and Tensor cores. Inside each SMP, CUDA cores perform pixel/vertex/geometry shading and physics/compute calculations, and each Tensor core provides a 4x4x4 matrix processing array to perform mixed precision fused multiply-add (FMA) mathematical operations. Texture units perform texture filtering and load/store units fetch and save data to memory. Special Function Units (SFUs) handle transcendental and graphics interpolation instructions.
+Finally, the PolyMorph Engine handles vertex fetch, tessellation, viewport transform, attribute setup, and stream output.
+==Features==
+*512-core
+*End-to-end lossless compression.
+*Tile Caching.
+*OpenGL 4.6, OpenGL ES 3.2, and Vulkan 1.0.
+*Adaptive Scalable Texture Compression (ATSC) LDR profile supported.
+*DirectX 12 compliant.
+*CUDA support.
+*Iterated blend, ROP OpenGL-ES blend modes.
+*2D BLIT from 3D class avoids channel switch.
+*2D color compression.
+*Constant color render SM bypass.
+*2x, 4x, 8x MSAA with color and Z compression.
+*Non-power-of-2 and 3D textures, FP16 texture filtering.
+*FP16 shader support.
+*Geometry and Vertex attribute Instancing.
+*Parallel pixel processing.
+*Early-z reject: Fast rejection of occluded pixels acts as a multiplier on pixel shader and texture performance while saving power and bandwidth.
+*Video protection region.
+*Power saving: Multiple levels of clock gating for linear scaling of power.
+<noinclude>
+{{Xavier/Foot|Processors/GPU|Processors/GPU/CUDA}}
+</noinclude>