Condition and loops replacement for Fine optimisations in CUDA Optimisation
RidgeRun CUDA Optimisation Guide | |||||
---|---|---|---|---|---|
GPU Architecture | |||||
|
|||||
Optimisation Workflow | |||||
|
|||||
Optimisation Recipes | |||||
|
|||||
Common pitfalls when optimising | |||||
|
|||||
Examples | |||||
|
|||||
Empirical Experiments | |||||
|
|||||
Contact Us |
GPU bound: removing if conditions / loops
Conditionals are the major source of thread divergence, increasing the walltime of a thread up to a factor of two. Conditionals include loops and if statements.
A way to mitigate the conditionals is by using masking or the ternary operator
For example, for creating an identity, you may use:
/* Naïve - Create an identity */ for (int i = 0; i < N; ++i) for (int j = 0; j < N; ++j) if (i == j) matrix[i][j] = 1; else matrix[i][j] = 0;
The optimised version would look like:
/* Optimised 1 - Create an identity */ for (int i = 0; i < N; ++i) for (int j = 0; j < N; ++j) matrix[i][j] = i == j ? 1 : 0; /* or Optimised 2 - Create an identity */ for (int i = 0; i < N; ++i) for (int j = 0; j < N; ++j) matrix[i][j] = i == j;
Pay attention that the condition does not exist anymore. It is better to use the condition directly or use a ternary operator. It will depend on the application.
Another one, but taking advantage of the instructions available:
/* Optimised 3 - Create an identity */ memset(matrix, 0, N * N * sizeof(Matrix)); for (int i = 0; i < N; ++i) matrix[i][i] = 1;
Now, let's consider that N is defined by a template or it is known at compile time. You may get rid of the loop by using unrolling:
/* Optimised 4 - Create an identity */ memset(matrix, 0, N * N * sizeof(Matrix)); #pragma unroll for (int i = 0; i < N; ++i) matrix[i][i] = 1;
The #pragma unroll
gives permission to the compiler to discretise the code after the compilation. In other words, it will be similar to write:
/* Optimised 4 - Create an identity */ memset(matrix, 0, N * N * sizeof(Matrix)); matrix[0][0] = 1; matrix[1][1] = 1; matrix[2][2] = 1; /* ... */ matrix[N - 2][N - 2] = 1; matrix[N - 1][N - 1] = 1;
What do you save here? You can save comparisons, jumps, and others. However, take into account that the program will increase in size (binaries).
Performance hints: avoid using if conditions, loops, divisions, and modulus as much as possible. They often lead to high computation time because of thread divergence and computational cost.