Jump to content

RidgeRun CUDA Optimisation Guide/Optimisation Recipes/Fine optimisations/Inlining: Difference between revisions

m
no edit summary
(Fix link)
mNo edit summary
Line 9: Line 9:
However, take into account that inlining sometimes causes performance degradation in some cases. Therefore, it is often adequate to thrust the compiler heuristic for inlining. It evaluates the potential performance gain from inlining given the elimination of call overhead against compile time. Aggressive inlining may lead to very larger code. It also may impact in the resource utilisation such as registers, which may impact negatively in the deployment of the code.
However, take into account that inlining sometimes causes performance degradation in some cases. Therefore, it is often adequate to thrust the compiler heuristic for inlining. It evaluates the potential performance gain from inlining given the elimination of call overhead against compile time. Aggressive inlining may lead to very larger code. It also may impact in the resource utilisation such as registers, which may impact negatively in the deployment of the code.


In our experience, the instances in which it makes sense to override the compiler's inlining heuristic are rare. We have used __noinline__ to limit code size and thus reduce excessive compile times. Use of __noinline__ has no predictable effect on register pressure that I am aware of. Inlining may allow more aggressive code movement such as load scheduling and this may increase register pressure, while not inlining may increase register pressure due to ABI restrictions on the use of registers. I have never found a case where use of __noinline__ improved performance, but of course such cases could exist, possibly due to instruction cache effects.
In our experience, the instances in which it makes sense to override the compiler's inlining heuristic are rare. We have used __noinline__ to limit code size and thus reduce excessive compile times. Use of __noinline__ has no predictable effect on register pressure that I am aware of. Inlining may allow more aggressive code movement such as load scheduling and this may increase register pressure, while not inlining may increase register pressure due to ABI restrictions on the use of registers. I have never found a case where the use of __noinline__ improved performance, but of course such cases could exist, possibly due to instruction cache effects.


In general, for inlining, you can provide a hint to the compiler by using:
In general, for inlining, you can provide a hint to the compiler by using:
Cookies help us deliver our services. By using our services, you agree to our use of cookies.