cuda performance constraints: conditional branching -


guys want know related cuda performance conditional branching. have following code

if(i==5) i=10; else i=5; 

now instead of if use following statement remove n/2 performance bottleneck in cuda?

i=(i==5)?10:5; 

thank in advance.

presumably "n/2 performance bottleneck" referring warp divergence due conditional branching.

it's in either formulation have shown, compiler make use of predicates avoid branching altogether, , both cases compile similar or identical machine code.

the compiler make aggressive use of predicated execution in order avoid branches , warp divergence.

in general, making valid inferences machine behavior c/c++ source code quite difficult. instead, compare both cases compiling ptx (nvcc -ptx ...), or better ordinary compile , dump machine code using cuobjdump -sass my_executable.


Comments

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

objective c - Greedy NSProgressIndicator Allocation -

how to set an OCR language in Google Drive -