Skip to content

Benchmark of polynomial evaluation using the horner scheme implemented in popular expression template libraries.

License

Notifications You must be signed in to change notification settings

BeneSim/polynomial_benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

Introduction

This benchmark compares the execution time of the Horner scheme algorithm for univariate polynomial evaluation, implemented with popular expression template libraries in C++, that have been made accessible in python through the use of pybind11.

The libraries that are currently available in the benchmark are

We use iterative and recursive Horner algorithms as well as dynamic and fixed size polynomials. The fixed recursive versions are of special interest since they heavily exploit the expression template nature of the aforementioned libraries through the use of TMP and auto return type deduction. However, those fixed size polynomials may not be applicable to most problems, therefore we also supply dynamic versions which accept arbitrary number of coefficients (order).

Run the Benchmark

In order to be able to run this benchmark on your PC you need to have the following libraries installed

C++:

You also need Python 3.x with the following libraries

Python:

With these libraries installed you can simply run

git clone https://github.com/BeneSim/polynomial_benchmark.git
cd polynomial_benchmark
mkdir build
cd build
cmake  -DCMAKE_PREFIX_PATH="\
/path/to/xtl/installation/root;\
/path/to/xsimd/installation/root;\
/path/to/xtensor/installation/root;\
/path/to/xtensor-python/installation/root;\
/path/to/Eigen/installation/root;\
/path/to/pybind11/installation/root" \
-DCMAKE_MODULE_PATH="/path/to/xtensor-python/src/cmake" \
-DCMAKE_BUILD_TYPE=Release ..
cmake --build . --target benchmark

You can of course omit the -DCMAKE_PREFIX_PATH stuff if you've installed the libraries in paths that CMake is aware of. xtensor-python ships with a FindNumPy.cmake module in the src/cmake folder.

There are 3 additional options that you can set (using cmake -DOPTION_NAME=VALUE or other tools like ccmake)

  • USE_MARCH_NATIVE=[ON/OFF] Compiles the python bindings with -march=native
  • USE_XSIMD=[ON/OFF] Enables xsimd support
  • USE_LAMBDA_XFUNCTION Uses an alternative approach for the fixed recursive Horner scheme (suggested by @wolfv)

If you want to run every combination, e.g. if you want to contribute to the benchmark results, you may want to use a simple bash script like this one

#!/bin/bash
for USE_MARCH_NATIVE in ON OFF
do
  for USE_XSIMD in ON OFF
  do
    for USE_LAMBDA_XFUNCTION in ON OFF
    do
      cmake -DUSE_MARCH_NATIVE=$USE_MARCH_NATIVE -DUSE_XSIMD=$USE_XSIMD -DUSE_LAMBDA_XFUNCTION=$USE_LAMBDA_XFUNCTION ..
      cmake --build . --target benchmark
    done
  done
done

Warning: Running all combinations will approximately take 30 minutes!

Benchmark Results

Here are some benchmarks for various CPUs, only the order = 20 benchmarks will be presented. The complete set of benchmarks can be found in benchmarks.

i7 4770k

GCC 8.2.0

USE_LAMBDA_XFUNCTION = ON

lambda order 20 xsimd native lambda order 20 native lambda order 20 xsimd lambda order 20

USE_LAMBDA_XFUNCTION = OFF

order 20 xsimd native order 20 native order 20 xsimd order 20

Clang 6.0.1

USE_LAMBDA_XFUNCTION = ON

lambda order 20 xsimd native lambda order 20 native lambda order 20 xsimd lambda order 20

USE_LAMBDA_XFUNCTION = OFF

order 20 xsimd native order 20 native order 20 xsimd order 20

i5 4300u

GCC 8.2.0

USE_LAMBDA_XFUNCTION = ON

lambda order 20 xsimd native lambda order 20 native lambda order 20 xsimd lambda order 20

USE_LAMBDA_XFUNCTION = OFF

order 20 xsimd native order 20 native order 20 xsimd order 20

Clang 6.0.1

USE_LAMBDA_XFUNCTION = ON

lambda order 20 xsimd native lambda order 20 native lambda order 20 xsimd lambda order 20

USE_LAMBDA_XFUNCTION = OFF

order 20 xsimd native order 20 native order 20 xsimd order 20

Xeon W3670

GCC 8.2.0

USE_LAMBDA_XFUNCTION = ON

lambda order 20 xsimd native lambda order 20 native lambda order 20 xsimd lambda order 20

USE_LAMBDA_XFUNCTION = OFF

order 20 xsimd native order 20 native order 20 xsimd order 20

Clang 6.0.1

USE_LAMBDA_XFUNCTION = ON

lambda order 20 xsimd native lambda order 20 native lambda order 20 xsimd lambda order 20

USE_LAMBDA_XFUNCTION = OFF

order 20 xsimd native order 20 native order 20 xsimd order 20

Armv7-a (RK3288 - Asus Tinkerboard)

GCC 8.1.0

USE_LAMBDA_XFUNCTION = ON

lambda order 20 xsimd native lambda order 20 native lambda order 20

USE_LAMBDA_XFUNCTION = OFF

order 20 xsimd native order 20 native order 20

Contributing

If you'd like to contribute with benchmark results on different CPUs feel free to add a PR. Please make sure to run all combinations of compile options as demonstrated in Run the Benchmark.

About

Benchmark of polynomial evaluation using the horner scheme implemented in popular expression template libraries.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published