Performance of MPICH-MX 1.2.7..1 over MX-10G
Uniprocessor case: one process per node, one NIC per node

Performance measurements are presented here for MPICH-MX over MX-10G for the Point to Point Communication tests in the Intel MPI Benchmark (IMB) Suite, Version 2.3.

We will provide performance data for the Collective Communication tests soon. Each benchmark is run with varying message lengths, and timings are averaged over multiple samples.

The environment for these tests consists of two quad-processor 3.2 GHz Supermicro X7DB8 (Intel Woodcrest) machines with a 10G-PCIE-8A-C NIC in each machine. The two machines were connected point-to-point (switchless). Each machine has 8 GB of memory and was running Linux 2.6.17.11, MX-10G 1.2.0h, and MPICH-MX 1.2.7..1. The Intel MPI Benchmark was compiled with gcc 4.0 with -O.

Notes:

Point to Point Communication

Point to point communication performance is measured between two processes. Latency is measured in microseconds (µs, shown as us in the graphs), and bandwidth is measured in MB/s. The latency scale is logarithmic and the bandwidth scale is linear.

IMB PingPong

PingPong is the classical pattern used for measuring startup (latency) and throughput (bandwidth) of a single message sent between two processes.

graph

IMB PingPing

As PingPong, PingPing measures the startup and throughput of a single message sent between two processes, with the crucial difference that messages are obstructed by oncoming messages. For this test, two processes communicate (MPI_Isend/MPI_Recv/MPI_Wait) with each other, with the MPI_Isend's issued concurrently.

graph

IMB Sendrecv

Based on MPI_Sendrecv(), the processes form a periodic communication chain. Each process sends to the right and receives from the left neighbor in the chain. The turnover count is 2 messages sample (1 in, 1 out) for each process.

For 2 processes, Sendrecv will report the bi-directional bandwidth of the system, as obtained by the (optimized) MPI_Sendrecv function. The results below are for 2 processes.

graph

IMB Exchange

Exchange is a communications pattern that often occurs in grid splitting algorithms (boundary exchanges). The group of processes is seen as a periodic chain, and each process exchanges data with both left and right neighbor in the chain.

The turnover count is 4 messages per sample (2 in, 2 out) for each process. The results below are for 2 processes.

graph


Performance measurements with two bonded NICs per node are also available. This mode of operation requires a special MX-10G patch.

Myricom banner
Last updated: 27 January 2007