c++ - Efficiency of STL algorithms with fixed size arrays -

March 15, 2011

in general, assume stl implementation of algorithm @ least efficient can come (with additional benefit of being error free). however, came wonder whether stl's focus on iterators might harmful in situations.

lets assume want calculate inner product of 2 fixed size arrays. naive implementation this:

std::array<double, 100000> v1; std::array<double, 100000> v2; //fill arbitrary numbers  double sum = 0.0; (size_t = 0; < v1.size(); ++i) {     sum += v1[i] * v2[i]; }

as number of iterations , memory layout known during compile time , operations can directly mapped native processor instructions, compiler should able generate "optimal" machine code (loop unrolling, vectorization / fma instructions ...).

the stl version

double sum = std::inner_product(cbegin(v1), cend(v1), cbegin(v2), 0.0);

on other hand adds additional indirections , if inlined, compiler still has deduce working on continuous memory region , region lies. while possible in principle, wonder, whether typical c++ compiler it.

so question is: think, there can performance benefit of implementing standard algorithms operate on fixed size arrays on own, or stl version outperform manual implementation?

as suggested did measurements ,

for code below
compiled vs2013 x64 in release mode
executed on win8.1 machine i7-2640m,

the algorithm version consistently slower 20% (15.6-15.7s vs 12.9-13.1s). relative difference, stays constant on 2 orders of magnitude n , reps.

so guess answer is: using standard library algorithms can hurt performance.

it still interesting, if general problem or if specific platform, compiler , benchmark. welcome post own resutls or comment on benchmark.

#include <iostream> #include <numeric> #include <array> #include <chrono> #include <cstdlib>  #define use_std_algorithm  using namespace std; using namespace std::chrono;  static const size_t n = 10000000; //size of arrays static const size_t reps = 1000; //number of repitions  array<double, n> a1; array<double, n> a2;  int main(){     srand(10);     (size_t = 0; < n; ++i) {         a1[i] = static_cast<double>(rand())*0.01;         a2[i] = static_cast<double>(rand())*0.01;     }      double res = 0.0;     auto start=high_resolution_clock::now();     (size_t z = 0; z < reps; z++) {              #ifdef use_std_algorithm             res = std::inner_product(a1.begin(), a1.end(), a2.begin(), res);                 #else                        (size_t t = 0; t < n; ++t)  {                 res+= a1[t] * a2[t];             }         #endif             }     auto end = high_resolution_clock::now();      std::cout << res << "  "; // <-- necessary, loop isn't optimized away     std::cout << duration_cast<milliseconds>(end - start).count() <<" ms"<< std::endl;  } /*   * update: results (ubuntu 14.04 , haswell)  * stl: algorithm  * g++-4.8-2    -march=native -std=c++11 -o3 main.cpp               : 1.15287e+24  3551 ms  * g++-4.8-2    -march=native -std=c++11 -ffast-math -o3 main.cpp   : 1.15287e+24  3567 ms  * clang++-3.5  -march=native -std=c++11 -o3 main.cpp               : 1.15287e+24  9378 ms  * clang++-3.5  -march=native -std=c++11 -ffast-math -o3 main.cpp   : 1.15287e+24  8505 ms  *  * loop:  * g++-4.8-2    -march=native -std=c++11 -o3 main.cpp               : 1.15287e+24  3543 ms  * g++-4.8-2    -march=native -std=c++11 -ffast-math -o3 main.cpp   : 1.15287e+24  3551 ms  * clang++-3.5  -march=native -std=c++11 -o3 main.cpp               : 1.15287e+24  9613 ms  * clang++-3.5  -march=native -std=c++11 -ffast-math -o3 main.cpp   : 1.15287e+24  8642 ms  */

edit:
did quick check g++-4.9.2 , clang++-3.5 o3and std=c++11 on fedora 21 virtual box vm on same machine , apparently compilers don't have same problem (the time same both versions). however, gcc's version twice fast clang's (7.5s vs 14s).

Search This Blog

Silver

c++ - Efficiency of STL algorithms with fixed size arrays -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -