Condition and loops replacement for Fine optimisations in CUDA Optimisation
RidgeRun CUDA Optimisation Guide  

GPU Architecture  


Optimisation Workflow  


Optimisation Recipes  


Common pitfalls when optimising  


Examples  


Empirical Experiments  


Contact Us 
GPU bound: removing if conditions / loops
Conditionals are the major source of thread divergence, increasing the walltime of a thread up to a factor of two. Conditionals include loops and if statements.
A way to mitigate the conditionals is by using masking or the ternary operator
For example, for creating an identity, you may use:
/* Naïve  Create an identity */ for (int i = 0; i < N; ++i) for (int j = 0; j < N; ++j) if (i == j) matrix[i][j] = 1; else matrix[i][j] = 0;
The optimised version would look like:
/* Optimised 1  Create an identity */ for (int i = 0; i < N; ++i) for (int j = 0; j < N; ++j) matrix[i][j] = i == j ? 1 : 0; /* or Optimised 2  Create an identity */ for (int i = 0; i < N; ++i) for (int j = 0; j < N; ++j) matrix[i][j] = i == j;
Pay attention that the condition does not exist anymore. It is better to use the condition directly or use a ternary operator. It will depend on the application.
Another one, but taking advantage of the instructions available:
/* Optimised 3  Create an identity */ memset(matrix, 0, N * N * sizeof(Matrix)); for (int i = 0; i < N; ++i) matrix[i][i] = 1;
Now, let's consider that N is defined by a template or it is known at compile time. You may get rid of the loop by using unrolling:
/* Optimised 4  Create an identity */ memset(matrix, 0, N * N * sizeof(Matrix)); #pragma unroll for (int i = 0; i < N; ++i) matrix[i][i] = 1;
The #pragma unroll
gives permission to the compiler to discretise the code after the compilation. In other words, it will be similar to write:
/* Optimised 4  Create an identity */ memset(matrix, 0, N * N * sizeof(Matrix)); matrix[0][0] = 1; matrix[1][1] = 1; matrix[2][2] = 1; /* ... */ matrix[N  2][N  2] = 1; matrix[N  1][N  1] = 1;
What do you save here? You can save comparisons, jumps, and others. However, take into account that the program will increase in size (binaries).
Performance hints: avoid using if conditions, loops, divisions, and modulus as much as possible. They often lead to high computation time because of thread divergence and computational cost.