Condition and loops replacement for Fine optimisations in CUDA Optimisation
RidgeRun CUDA Optimisation Guide | |||||
---|---|---|---|---|---|
![]() | |||||
GPU Architecture | |||||
|
|||||
Optimisation Workflow | |||||
|
|||||
Optimisation Recipes | |||||
|
|||||
Common pitfalls when optimising | |||||
|
|||||
Examples | |||||
|
|||||
Empirical Experiments | |||||
|
|||||
Contact Us |
GPU bound: removing if conditions / loops
Conditionals are the major source of thread divergence, increasing the walltime of a thread up to a factor of two. Conditionals include loops and if statements.
A way to mitigate the conditionals is by using masking or the ternary operator
For example, for creating an identity, you may use:
/* Naïve - Create an identity */
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
if (i == j)
matrix[i][j] = 1;
else
matrix[i][j] = 0;
The optimised version would look like:
/* Optimised 1 - Create an identity */
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
matrix[i][j] = i == j ? 1 : 0;
/* or Optimised 2 - Create an identity */
for (int i = 0; i < N; ++i)
for (int j = 0; j < N; ++j)
matrix[i][j] = i == j;
Pay attention that the condition does not exist anymore. It is better to use the condition directly or use a ternary operator. It will depend on the application.
Another one, but taking advantage of the instructions available:
/* Optimised 3 - Create an identity */
memset(matrix, 0, N * N * sizeof(Matrix));
for (int i = 0; i < N; ++i)
matrix[i][i] = 1;
Now, let's consider that N is defined by a template or it is known at compile time. You may get rid of the loop by using unrolling:
/* Optimised 4 - Create an identity */
memset(matrix, 0, N * N * sizeof(Matrix));
#pragma unroll
for (int i = 0; i < N; ++i)
matrix[i][i] = 1;
The #pragma unroll
gives permission to the compiler to discretise the code after the compilation. In other words, it will be similar to write:
/* Optimised 4 - Create an identity */
memset(matrix, 0, N * N * sizeof(Matrix));
matrix[0][0] = 1;
matrix[1][1] = 1;
matrix[2][2] = 1;
/* ... */
matrix[N - 2][N - 2] = 1;
matrix[N - 1][N - 1] = 1;
What do you save here? You can save comparisons, jumps, and others. However, take into account that the program will increase in size (binaries).
Performance hints: avoid using if conditions, loops, divisions, and modulus as much as possible. They often lead to high computation time because of thread divergence and computational cost.