RidgeRun CUDA Optimisation Guide/Optimisation Recipes/Fine optimisations/Inlining: Difference between revisions

From RidgeRun Developer Wiki
(Created page with "<noinclude> {{RidgeRun CUDA Optimisation Guide/Head|previous=|next=|keywords=}} </noinclude> == GPU bound: inlining == Inlining is another optimisation technique. Take it in...")
 
mNo edit summary
Line 1: Line 1:
<noinclude>
<noinclude>
{{RidgeRun CUDA Optimisation Guide/Head|previous=|next=|keywords=}}
{{RidgeRun CUDA Optimisation Guide/Head|previous=Optimisation Recipes/Fine optimisations/Condition and loops replacement|next=Optimisation Recipes/Common pitfalls when optimising|keywords=}}
</noinclude>
</noinclude>


Line 19: Line 19:


<noinclude>
<noinclude>
{{RidgeRun CUDA Optimisation Guide/Foot||}}
{{RidgeRun CUDA Optimisation Guide/Foot|Optimisation Recipes/Fine optimisations/Condition and loops replacement|Optimisation Recipes/Common pitfalls when optimising}}
</noinclude>
</noinclude>

Revision as of 14:27, 9 October 2021




Previous: Optimisation Recipes/Fine optimisations/Condition and loops replacement Index Next: Optimisation Recipes/Common pitfalls when optimising




GPU bound: inlining

Inlining is another optimisation technique. Take it into account when the kernel calls another function so many times. Often, the function invocations involve context switching, memory jumping, and other phenomena that cause performance degradation.

However, take into account that inlining sometimes causes performance degradation in some cases. Review this thread.

In general, for inlining, you can provide a hint to the compiler by using:

  • The __noinline__ function qualifier can be used as a hint for the compiler not to inline the function if possible.
  • The __forceinline__ function qualifier can be used to force the compiler to inline the function.
  • The __noinline__ and __forceinline__ function qualifiers cannot be used together, and neither function qualifier can be applied to an inline function.

By default, the compiler inlines any __device__ function when deemed appropriate.


Previous: Optimisation Recipes/Fine optimisations/Condition and loops replacement Index Next: Optimisation Recipes/Common pitfalls when optimising