README.gpu

This is the README file for GPU version of GMP-ECM.
The GPU code will only work with NVIDIA GPU of compute capability greater
than 3.5.

Table of contents of this file

1. How to enable GPU code in GMP-ECM
2. Basic Usage
3. Advanced Usage
4. Known issues

##############################################################################

1. How to enable GPU code in GMP-ECM

By default the GPU code is not enabled, to enable it you have to follow the 
instructions of INSTALL-ecm until the 'configure' step. Then add "--enable-gpu"
and "--with-cgbn-include=...". CGBN headers are required for GPU builds after
f81ddf3b. "--with-cgbn-includes" should point to the CGBN include directory
generally "../CGBN/include/cgbn". CGBN is a "CUDA Accelerated Multiple
Precision Arithmetic (Big Num)" library available at
https://github.com/NVlabs/CGBN

  $ ./configure --enable-gpu --with-cgbn-include=/PATH/DIR/CGBN/include/cgbn
	
This will configure the code for NVIDIA GPU for all compute capabilities
between 3.5 and 9.0 known to the nvcc compiler.

To enable only a single compute capability you can set '--enable-gpu=XX'

  $ ./configure --enable-gpu=61 [other options]

By default, GMP-ECM will look for cuda.h in the default header directories,
but you can specify another directory, such as /opt/cuda, with:

  $ ./configure --enable-gpu --with-cuda=/opt/cuda

By default, GMP-ECM will look for the nvcc compiler in $PATH, but you can
specify another directory:

  $ ./configure --enable-gpu --with-cuda-bin=/PATH/DIR

For finer control you can specify the location of cuda.h as follows:

  $ ./configure --enable-gpu --with-cuda-include=/PATH/DIR

By default, GMP-ECM will look for CUDA the default library directories, but you
can specify another directory:

  $ ./configure --enable-gpu --with-cuda-lib=/PATH/DIR

Some versions of CUDA are not compatible with recent versions of gcc.
To specify which C compiler is called by the CUDA compiler nvcc, type:

  $ ./configure --enable-gpu --with-cuda-compiler=/PATH/DIR

The value of this parameter is directly passed to nvcc via the option
"--compiler-bindir". By default, GMP-ECM lets nvcc choose what C compiler it
uses.

If you get errors about "cuda.h: present but cannot be compiled"
Try setting CC to a know good gcc, you may need to use and --with-cuda-compiler

  $ ./configure --enable-gpu CC=gcc-8

Then, to compile the code, type:

  $ make

And to check that the program works correctly, type:

  $ make check

Additional randomized checks can be run with

  $ sage check_gpuecm.sage ./ecm

For failing kernels some additional information may be present in cuda-memcheck

  $ echo "(2^997-1)" | cuda-memcheck ./ecm -gpu -gpucurves 4096 -v 16000 0

##############################################################################

2. Basic Usage

To use your GPU for step 1, just add the -gpu option:

  $ echo "(2^835+1)/33" | ./ecm -gpu 1e4

It will compute step 1 on the GPU, and then perform step 2 on the CPU (not in
parallel).

The only parametrization compatible with GPU code is "-param 3".

You can save the end of step 1 with "-save" and then load the file to execute
step 2. But you cannot resume to continue step 1 with a bigger B1.

The options "-mpzmod", "-modmuln", "-redc", "-nobase2" and "-base2" have no
effect on step 1, if the "-gpu" option is activated, but will apply for step 2.

##############################################################################

3. Advanced Usage

The option "-gpudevice n" forces the GPU code to be executed on device n. Nvidia
tool "nvidia-smi" can be used to know to which number is associated a GPU.
Moreover, you can use GMP-ECM option "-v" (verbose) to see the properties of the
GPU on which the code is run.

The option "-gpucurves n" forces GMP-ECM to compute n curves in parallel on the
GPU. By default, the number of curves is choose to fill completly the GPU. The
number of curves must be a multiple of the number of curves by multiprocessors
(which depend on the GPU) or else it would be rounded to the next multiple.

Throughput for determining CGBN kernel size and "-gpucurves" can be tested
using the provided "gpu_throughput_test.sh" script. This takes optional ecm
command and number of curves.

    $ ./gpu_throughput_test.sh [ECM_CMD] [GPUCURVES]

The CGBN based GPU code can be easily changed to support inputs from 256-32768
bits. Several different sized CUDA kernels are defined in cgbn_stage1.cu.
These kernels are fixed at compile type. A log message is printed if
recompiling with a different sized kernel would likely speed up execution.

See the comment "Compiling custom kernel" in cgbn_stage1.cu near line 680.
Each additional kernel increases compile time and binary size so only two
are included in development mode.

##############################################################################

4. Known issues

If you get "Error msg: forward compatibility was attempted on non supported HW"
or "error: 'cuda.h' and 'cudart' library have different versions", then you
can look at https://stackoverflow.com/questions/43022843/nvidia-nvml-driver-library-version-mismatch/45319156#45319156.
In general the best solution is to restart the machine.

##############################################################################

Please report to sethtroisi (by email or on mersenneforum) any problems,
bugs, or observations.