Workload Problem size in CUDA Coarse optimisations
RidgeRun CUDA Optimisation Guide | |||||
---|---|---|---|---|---|
GPU Architecture | |||||
|
|||||
Optimisation Workflow | |||||
|
|||||
Optimisation Recipes | |||||
|
|||||
Common pitfalls when optimising | |||||
|
|||||
Examples | |||||
|
|||||
Empirical Experiments | |||||
|
|||||
Contact Us |
Communication bound: Not enough work
To get a real benefit from the GPU, it should have enough work to overcome the communication penalties. If the work is not enough, the communication penalties will be considerably higher than the computation time, being this last one negligible (10x rule). If that happens:
1. Move the workload to the CPU by using a worker.
You may save the communication penalty and memory copy.
2. Accept the penalty.
If it does not make sense to move it to the CPU, keep it there and look for other optimisation opportunities.