Workload Problem size in CUDA Coarse optimisations

Communication bound: Not enough work

To get a real benefit from the GPU, it should have enough work to overcome the communication penalties. If the work is not enough, the communication penalties will be considerably higher than the computation time, being this last one negligible (10x rule). If that happens:

1. Move the workload to the CPU by using a worker.

You may save the communication penalty and memory copy.

2. Accept the penalty.

If it does not make sense to move it to the CPU, keep it there and look for other optimisation opportunities.

Previous: Optimisation Recipes/Coarse optimisations/Workload offloading

Index

Next: Optimisation Recipes/Coarse optimisations/Communication overlapping

❯