cuda performance constraints: conditional branching -
guys want know related cuda performance conditional branching. have following code
if(i==5) i=10; else i=5; now instead of if use following statement remove n/2 performance bottleneck in cuda?
i=(i==5)?10:5; thank in advance.
presumably "n/2 performance bottleneck" referring warp divergence due conditional branching.
it's in either formulation have shown, compiler make use of predicates avoid branching altogether, , both cases compile similar or identical machine code.
the compiler make aggressive use of predicated execution in order avoid branches , warp divergence.
in general, making valid inferences machine behavior c/c++ source code quite difficult. instead, compare both cases compiling ptx (nvcc -ptx ...), or better ordinary compile , dump machine code using cuobjdump -sass my_executable.
Comments
Post a Comment