Myri-10G
10-Gigabit Ethernet
Performance Measurements

We report netperf, ntttcps and ntttcpr (from the Windows 2003 DDK), or iperf performance measurements using 9000-byte (jumbo) frames and 1500-byte (standard) frames. Bandwidth (BW) is measured in Megabits/second. For the Linux, Solaris and FreeBSD tests, we used Intel quad-core dual-processor 2.66GHz Xeon X5355s with the Supermicro X7DB8 motherboard. For the Windows tests, the sender is an AMD single-core dual-processor 2.6GHz Opteron with the Tyan S2895 motherboard, and the receiver is a Dell PowerEdge 2950 machine. For the MacOSX tests, we used Intel dual-core dual-processor 2.6GHz Xeons with the MacPro.

Linux | Windows | Solaris | MacOSX | FreeBSD

Linux

The Linux tests were run using netperf version 2.4.3 with the Ubuntu 7.04 x86_64 2.6.20-16-server kernel and the Myri10GE version 1.3.0 driver. TCP buffer sizes were increased and TCP timestamps were disabled as recommended in the Performance Tuning section of the Linux Myri10GE README. TCP Segmentation Offload (TSO) and Large Receive Offload (LRO) were enabled, and the default interrupt coalescing setting of 75 was used. The netserver was run without options. The two hosts were connected without a switch (point-to-point).

Netperf Results, MTU 9000

Commands:
     $ netperf -H dust01-m -t TCP_STREAM -C -c -l 60
     $ netperf -H dust01-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinux
     $ netperf -H dust01-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 1M -S 1M
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9910.67  10.74       9.07
     TCP_SENDFILE   9000   9903.59   3.24	9.11
     UDP_STREAM_TX  9000   9871.30  12.50	0.00
     UDP_STREAM_RX  9000   9871.30   0.00	9.29
  

Netperf Results, MTU 1500

Commands:
     $ netperf -H dust01-m -t TCP_STREAM -C -c -l 60
     $ netperf -H dust01-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinux
     $ netperf -H dust01-m -t UDP_STREAM -l 60 -C -c -- -m 1472 -s 1M -S 1M
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   9468.70  10.12      11.61
     TCP_SENDFILE   1500   9387.47   2.63      11.60
     UDP_STREAM_TX  1500   5501.80  12.49       0.00
     UDP_STREAM_RX  1500   5501.60   0.00      10.73
  

Notes:


Windows

The Windows tests were done with the Myri10GE AMD64 1.0.1 driver and Windows Server 2003 x64 SP1 Edition. TCP Segmentation Offload (TSO) was enabled, checksum offload was enabled, an interrupt coalescing delay of 25 was used, and flow control was enabled. No registry entries were added to the Windows 2003-based machines.

One ntttcps process was run on one Windows host connected to one Windows host running one ntttcpr process. These two hosts were connected without a switch (point-to-point).

Ntttcp Results, MTU 9000

Commands:
    Sender: ntttcps -m 1,1,10.0.130.50 -l 1048576 -n 100000 -w -v -a 8
    Receiver: ntttcpr -m 1,1,10.0.130.50 -l 1048576 -rb 2097152 -n 1000000 -w -v -a 8
Results on the Sender:
 
-----------------------------------------------------------------
|     Estimated Time to Complete Test at line speed (seconds)   |
-----------------------------------------------------------------

1000 Base-T  622 OC-12(ATM)  155 OC-3(ATM)  100 Base-T  10 Base-T
===========  ==============  =============  ==========  =========

        419             369           1408        2128       25000



------------------------------------------------------
|                   Output Summary                   |
------------------------------------------------------

Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s)
====== =========== ================ ==================

     0      85.500        1226404.678           9811.237


Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================

   104857.600000      85.500          60667.263                 9811.237


Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========

   100000.000              1169.591               1      23467.10         0.5


Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========

     1728405           281845                 2            0      10.70
  
Results on the Receiver:
 
-----------------------------------------------------------------
|     Estimated Time to Complete Test at line speed (seconds)   |
-----------------------------------------------------------------

1000 Base-T  622 OC-12(ATM)  155 OC-3(ATM)  100 Base-T  10 Base-T
===========  ==============  =============  ==========  =========

        419             369           1408        2128       25000



------------------------------------------------------
|                   Output Summary                   |
------------------------------------------------------

Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s)
====== =========== ================ ==================

     0      85.735        1223043.098           9784.345


Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================

   104857.600000      85.735           8959.587                 9784.345


Total Buffers Throughput(Buffers/s) Pkts(recv/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========

   100000.000              1166.385              29       4610.68         2.7


Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========

      281837         11703396                 0            0      27.27
  

Notes:


Solaris

The Solaris tests were done with the Myri10GE AMD64 1.0.1rc0 driver. Solaris's GLDv2 driver ABI does not support TCP Segmentation Offload (TSO) or TCP Large Receive Offload (LRO).
    OS: Solaris 10U3 (5.10 Generic_125101-08 i86pc i386 i86pc)
    myri10ge_intr_coal_delay=75
    myri10ge_use_rx_taskq=0
    CPU affinity set the netperf and netserver processes to CPU 0.
    Socket buffer sizes were set by the application
    to 393216 bytes.
   
    Test           MTU    BW        TX_CPU %   RX_CPU %
    ----           ----   -------   --------   --------
    TCP_STREAM     9000   9880.44   14.95      17.61
    TCP_SENDFILE   9000   9885.86   12.01      17.91
    UDP_STREAM_TX  9000   8344.10   13.87      00.00
    UDP_STREAM_RX  9000   8343.90   00.00      19.13
   
    TCP_STREAM     1500   5882.54   20.95      18.70
    TCP_SENDFILE   1500   5316.63   18.03      17.92
    UDP_STREAM_TX  1500   4138.10   15.59      00.00
    UDP_STREAM_RX  1500   4136.90   00.00      21.22
  


MacOSX

The MacOSX tests were run using netperf version 2.4.3 and iperf version 2.0.2 with MacOSX 10.5 and the Myri10GE version 1.1.0 driver. MacOSX does not support TCP Segmentation Offload (TSO). LRO was enabled as recommended in the Performance Tuning section of the MacOSX Myri10GE README. The default interrupt coalescing setting of 75 was used. The netserver was run without options. The iperf server was run with the same window (-w) and buffer length (-l) arguments as the client. The two hosts were connected without a switch (point-to-point).

Netperf Results, MTU 9000

Commands:
     $ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -S 768K -S 768K -m 256K
     $ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
     $ iperf -c macpro01-m -w  -w 768k -l 256k -P 2 -f m -t 60
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9661.82  41.38      36.74
     UDP_STREAM_TX  9000   6867.00  28.08      00.00
     UDP_STREAM_RX  9000   6867.00  00.00      39.26
  
Dual-Stream TCP Results (2 netperf processes):
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9692.00  54.72      47.36
  
Dual-Stream TCP Results (2 iperf threads):
      Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     iperf          9000   9825.00  65         58         
  

Netperf Results, MTU 1500

Commands:
     $ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -s 768K -S 768K -m 256K
     $ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
     $ iperf -c macpro01-m -w 512k -l 256k -P 2 -f m -t 60
  
Single-Stream Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   4782.15  41.70      39.15
     UDP_STREAM_TX  1500   3310.40  27.85      00.00
     UDP_STREAM_RX  1500   3310.40  00.00      39.24
  
Dual-Stream TCP Results (2 netperf processes):
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   4367.00  42.29      43.75
  
Dual-Stream TCP Results (2 iperf threads):
      Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     iperf          1500   6417.00  76         65
  

Notes:


FreeBSD

The FreeBSD tests were run using netperf version 2.4.3 with with the if_mxge driver found in FreeBSD 7.0-current. We used FreeBSD/amd64 built from sources dated June 11th, 2007. The kernel was configured without the WITNESS or INVARIENTS options, and with SCHED_ULE scheduler. The kern.ipc.maxsockbuf tunable was increased to 8388608. TCP Segmentation Offload (TSO) and Large Receive Offload (LRO) were enabled, and the default interrupt coalescing setting of 30 was used. The netserver was run without options. The two hosts were connected without a switch (point-to-point).

Netperf Results, MTU 9000

Commands:
     $ netperf -H dust01-m -t TCP_STREAM -C -c -l 60 -- -S768K -s768K
     $ netperf -H dust01-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel -- -S768K -s768K
     $ netperf -H dust01-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 512K -S 512K
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9894.99  13.59       10.95
     TCP_SENDFILE   9000   9892.65   5.66       11.04
     UDP_STREAM_TX  9000   9924.70  15.28	0.00
     UDP_STREAM_RX  9000   9924.70   0.00	9.49
  

Netperf Results, MTU 1500

Commands:
     $ netperf -H dust01-m -t TCP_STREAM -C -c -l 60 -- -S768K -s768K
     $ netperf -H dust01-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel -- -S768K -s768K
     $ netperf -H dust01-m -t UDP_STREAM -l 60 -C -c -- -m 16K -s 512K -S 512K
  
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   9305.14   9.60      17.21
     TCP_SENDFILE   1500   9355.76   4.95      17.34
     UDP_STREAM_TX  1500   5557.00  14.21       0.00
     UDP_STREAM_RX  1500   5553.10   0.00      11.44
  

Notes:


Myricom banner
Last updated: 25 April 2008