RidgeRun CUDA Optimisation Guide/GPU Architecture/Memory hierarchy: Difference between revisions

RidgeRun CUDA Optimisation Guide/GPU Architecture/Memory hierarchy (view source)

414 bytes added , 11 October 2021

no edit summary

1,654

edits

@@ Line 12: / Line 12: @@
 * Host memory
-It excludes constant memory segments. However, they are [https://stackoverflow.com/questions/18020647/cuda-constant-memory-best-practices optimised to be faster because of caching].
+It excludes constant memory segments. However, they are [https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#maximize-memory-throughput optimised to be faster because of caching]. One important take away from the programming guide is the following:
+* The constant memory segment can be cached. However,
+* if a warp makes a request to <code>__constant__</code> memory where different threads in the warp are accessing different locations, those requests will be serialised.
+* The better performance will be achieved when the threads within the warp access the same constant value.
 <pre style="background-color:#82daeb">