RidgeRun CUDA Optimisation Guide/GPU Architecture/Memory hierarchy: Difference between revisions

From RidgeRun Developer Wiki
(Created page with "<noinclude> {{RidgeRun CUDA Optimisation Guide/Head|previous=|next=|keywords=}} </noinclude> The memory hierarchy is quite interesting in NVIDIA GPUs. From faster to slower (...")
 
mNo edit summary
Line 1: Line 1:
<noinclude>
<noinclude>
{{RidgeRun CUDA Optimisation Guide/Head|previous=|next=|keywords=}}
{{RidgeRun CUDA Optimisation Guide/Head|previous=GPU Architecture/Execution process|next=GPU Architecture/Communication & Concurrency|keywords=}}
</noinclude>
</noinclude>


Line 30: Line 30:


<noinclude>
<noinclude>
{{RidgeRun CUDA Optimisation Guide/Foot||}}
{{RidgeRun CUDA Optimisation Guide/Foot|GPU Architecture/Execution process|GPU Architecture/Communication & Concurrency}}
</noinclude>
</noinclude>

Revision as of 10:46, 9 October 2021




Previous: GPU Architecture/Execution process Index Next: GPU Architecture/Communication & Concurrency




The memory hierarchy is quite interesting in NVIDIA GPUs. From faster to slower (smaller to bigger):

  • CUDA Core Registers
  • L1 Cache
  • Shared Memory
  • L2 Cache
  • Global memory
  • Host memory

It excludes constant memory segments. However, they are optimised to be faster because of caching.

Performance hints: Constant memory segments make sense when reusing data. You can store LUTs, provided they are constantly retrieved by threads within a block.

An important point to talk about is the different types of memory. You may find the following concepts:

  • Non-pinned memory / pageable memory: corresponds to the normal host memory. A page is a segment of memory that can be transferred and it is optimised to avoid memory segmentation when dealing with dynamic partitioning. The memory may not be physically contiguous.
  • Pinned memory / non-pageable memory: is a chunk of memory contiguous physically (often). It is optimal when dealing with communication between devices and the CPU. The DMA can easily transfer the memory without using too many transactions.
  • Unified Memory Addressing / Managed Memory: is a chunk of memory whose pointer is accessible from the host and the device. Under the hood, there are two chunks of memory (one on the host and another one on the device) which are coherently managed by the UVA system.
Performance hints: Use pinned memory whenever possible (big memory chunks mainly) to exploit any communication optimisation. Using non-pinned memory can lead to serious performance degradation because of memory transfer scattering.


Previous: GPU Architecture/Execution process Index Next: GPU Architecture/Communication & Concurrency