Increasing arithmetic intensity for Fine optimisations in CUDA Optimisation

From RidgeRun Developer Wiki



Previous: Optimisation Recipes/Fine optimisations Index Next: Optimisation Recipes/Fine optimisations/Function approximation






Memory bound / GPU bound: increase arithmetic intensity

Arithmetic intensity can be defined as the number of operations per byte. The idea is to squeeze each byte computing as much as possible. This is achieved by changing precision, mainly. For example, provided a kernel with double-precision operations, it is possible to switch to single-precision operations. It will:

  • Double the OPS/s: single-precision is two times faster than double precision.
  • Require half the space.

However, casting can be costly in the GPU. It is preferable that the whole application shifts entirely to float, meaning that both host and device operations are pure single-precision operations. On the other hand, if the computations require fine precision, this optimisation shall be evaluated carefully, giving a report of the error affectation.

Performance hints: Prefer single-precision on top of double-precision. If half-precision can be used, it is even better

On the host side, despite the double-precision can perform similarly to single-precision, it is still better because of memory occupation. The problem is still two-fold: compute + memory.


Previous: Optimisation Recipes/Fine optimisations Index Next: Optimisation Recipes/Fine optimisations/Function approximation