Next Previous Contents

6. OpenMPI

The section describes how to configure OpenMPI to use Infiniband.

6.1 Configure IPoIB

OpenMPI uses IPoIB for job startup and tear-down. You should configure IPoIB on all of your hosts.

6.2 Load the modules

Ensure the rdma_ucm module is loaded.

modprobe rdma_ucm

6.3 Check permissions and limits

Uses who want to run MPI jobs will need to have write permissions for the following devices:

 /dev/infiniband/uverbs*
/dev/infiniband/rdma_cm*
The simplest way to do this is to add the users to the rdma group. If that is not suitiable for your site, you can change the permissions and ownership of these devices by editing the following udev rules:
/etc/udev/rules.d/50-udev.rules
/etc/udev/rules.d/91-permissions.rules

OpenMPI will need to pin memory. Edit /etc/security/limits.conf and add the line:

* hard memlock unlimited

6.4 Install the mpi test programs

Check the mpitests package is installed.

aptitude install mpitests

6.5 Configure Host Access

OpenMPI uses ssh to spawn jobs on remote hosts. You should configure a public/private keypair to ensure that you can ssh between hosts without entering a password. You should also ensure that your login process is silent.

6.6 Run the MPI PingPong benchmark

We will use the MPI PingPong benchmark for our testing. By default, openmpi should use inifiniband networks in preference to any tcp networks it finds. However, we will force mpi to ignore tcp networks to ensure that is using the infiniband network.

#!/bin/bash
#Infiniband MPI test program
#Edit the hosts below to match your test hosts
cat > /tmp/hostfile.$$.mpi <<EOF
hostA slots=1
HostB slots=1
EOF

mpirun --mca btl_openib_verbose 1 --mca btl ^tcp -n 2 -hostfile /tmp/hostfile.$$.mpi IMB-MPI1 PingPong

If all goes well you should see openib debugging messages from both hosts, together with the job output.

<snip>
# PingPong
[HostB][0,1,1][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] Set MTU to IBV value 4 (2048 bytes)
[HostB][0,1,1][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] Set MTU to IBV value 4 (2048 bytes)
[HostA][0,1,0][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] Set MTU to IBV value 4 (2048 bytes)
[HostA][0,1,0][btl_openib_endpoint.c:992:mca_btl_openib_endpoint_qp_init_query] Set MTU to IBV value 4 (2048 bytes)

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         1.53         0.00
            1         1000         1.44         0.66
            2         1000         1.42         1.34
            4         1000         1.41         2.70
            8         1000         1.48         5.15
           16         1000         1.50        10.15
           32         1000         1.54        19.85
           64         1000         1.79        34.05
          128         1000         3.01        40.56
          256         1000         3.56        68.66
          512         1000         4.46       109.41
         1024         1000         5.37       181.92
         2048         1000         8.13       240.25
         4096         1000        10.87       359.48
         8192         1000        15.97       489.17
        16384         1000        30.54       511.68
        32768         1000        55.01       568.12
        65536          640       122.20       511.46
       131072          320       207.20       603.27
       262144          160       377.10       662.96
       524288           80       706.21       708.00
      1048576           40      1376.93       726.25
      2097152           20      1946.00      1027.75
      4194304           10      3119.29      1282.34

If you encounter any errors read the excellent OpenMPI troubleshooting guide. http://www.openmpi.org

If you want to compare infiniband performance with your ethernet/TCP networks, you can re-run the tests using flags to tell openmpi to use your ethernet network. (The example below assumes that your test nodes are connected via eth0).

#!/bin/bash
#TCP MPI test program
#Edit the hosts below to match your test hosts
cat > /tmp/hostfile.$$.mpi <<EOF
hostA slots=1
HostB slots=1
EOF
mpirun --mca btl ^openib --mca btl_tcp_if_include eth0 --hostfile hostfile -n 2 IMB-MPI1 -benchmark PingPong

You should notice signficantly higher latencies than for the infiniband test.


Next Previous Contents