c - How to vectorize this kernel? -

July 15, 2011

i made question in middle of one, , seems nobody answer in previous topic.

my question following. have been vectorizing 1 application success. however, particular kernel:

inline int cfunction(node** a, node* b){      long dot = 0, i;      for(i=0;i<size;i++)             dot += (*a)->data[i] * b->data[i];      if(abs(2 * dot) <= b->norm)             return 0;      long q = round((double) dot / b->norm);      for(i=0;i<size;i++)             (*a)->data[i] -= q * b->data[i];      (*a)->norm = (*a)->norm + q * q * b->norm - 2 * q * dot;      return 1;  }

i able vectorize first loop. if put icpc run code, with:

 icpc *.c *.h -g -o2 -msse4.2 -vec-report=1

i have following report:

main.c(558): (col. 9) remark: loop vectorized.

main.c(566): (col. 2) remark: loop vectorized.

which tells me icpc vectorizes code. now, if hand vectorize first loop have (perfect?) speedup factor ints. tells me compiler not doing job @ vectorizing (especially because if use short s performance same). however, second loop, no whatsoever performance gains if hand vectorize this:

    const int q = round((double) dot / b->norm) ;      int32_t * pa = (*a)->data;     int32_t * const pb = b->data;      const __m128i vecqi = _mm_set1_epi32(q);      __m128i vecresi, vecpi, vecci, vecqci;      for(i=0;i<size-3;i+=4){             vecpi = _mm_load_si128((__m128i *)&(pa)[i] );             vecci = _mm_load_si128((__m128i *)&(pb)[i] );             vecqci = _mm_mullo_epi32(vecqi,vecci);             vecresi = _mm_sub_epi32(vecpi,vecqci);             _mm_store_si128((__m128i *) ((pa) + i), vecresi );     }      for(;i<size;i++)             pa[i] -= q * pb[i];      (*a)->norm = (*a)->norm + q * q * b->norm - 2 * q * dot;

does have clue of why not getting performance gains vectorizing second kernel?

thanks.

Search This Blog

Silver

c - How to vectorize this kernel? -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -