Myricom at SC06 logo Tampa, Florida
11-17 November 2006

Live demos of Myri-10G showcase versatility and high-performance interoperability

Top 3 in the SC06 Bandwidth Challenge all used Myri-10G 10-Gigabit Ethernet NICs

Upgraded MareNostrum cluster in Barcelona ranks #5 in TOP500
The fastest cluster in the world uses Myrinet-2000 technology


Photo of the cluster in Myricom's SC06 boothBooth Demonstrations. Myri-10G, the fourth generation of Myricom products, is a convergence at 10-Gigabit/s data rates of Myrinet cluster-interconnect technology with Ethernet. The physical links of Myri-10G components are 10-Gigabit Ethernet, but the NICs, switches, and software support both Ethernet and Myrinet protocols at the Data Link level. The live demonstrations in Myricom's SC06 booth were selected to showcase the performance, versatility, and interoperability of Myri-10G solutions.

The nuts and bolts of the booth demo. In one corner of the Myricom booth was the cluster shown to the right. For the purposes of the demonstrations, each of the six hosts included two Myri-10G NICs, one operating in Ethernet mode and the other operating in Myrinet mode. Four hosts were two-socket dual-core AMD Opteron servers, and two hosts were two-socket dual-core Intel Woodcrest servers, a total of 24 processors. All of these servers were running Linux, but we maintained remote access throughout the exhibit to Windows Compute Cluster Server Myri-10G clusters at Myricom's software development laboratory in Oak Ridge, Tennessee, and at the Microsoft Partners Solution Center in Redmond, Washington.

In order to emphasize that 10-Gigabit Ethernet switch solutions are available from many vendors, the cluster in the Myricom booth included three switches (top to bottom):

Three of the 10-Gigabit Ethernet ports (10GBase-SR) of the Myri-10G switch connected through multimode fiber to Windows, Mac OS X, and Solaris satellite hosts in the other corners of the booth. These hosts each had Myri-10G 10-Gigabit Ethernet NICs and were running the standard Myricom drivers for these hosts.

High-performance interoperability. With this set of equipment available, we could demonstrate high-performance interoperability between systems from different vendors, running four different operating systems, and with a choice of application programming interfaces.

The cluster hosts were running MX (Myrinet Express), the low-level message-passing system for Myri-10G. MX operates by kernel bypass to achieve low latency and low host-CPU load with either Myrinet or Ethernet fabrics. MX also supports TCP/IP traffic over Myrinet (IPoM) or over Ethernet (IPoE).

Using MX, the cluster hosts or any subset of them could run MPI benchmarks and applications through the Myrinet switch (MXoM, MX over Myrinet) or through either of the 10-Gigabit Ethernet switches (MXoE, MX over Ethernet). The MPI latency for MXoM was 2.3µs, and for MXoE using the low-latency Fujitsu XG700 10-Gigabit Ethernet switch was 2.8µs. For both MXoM and MXoE, the MPI one-way (PingPong) data rate was 1.2GByte/s, and the bidirectional (SendRecv) data rate was 2.4GByte/s.

These performance measurements were with MPICH-MX over MX. However, the booth cluster could also run Open MPI over MX. Myricom is a member of and a software contributor to Open MPI, and provided one of the iPod promotional prizes in an Open MPI drawing in the Myricom booth Tuesday afternoon.

A capability just introduced: Ethernet-protocol ports on a Myri-10G switch. Going back to the TCP/IP traffic on the Myrinet fabric (IPoM), we were for the first time at SC06 demonstrating the new capability of IPoM-IPoE protocol conversion on a switch line card:

Photo of a 10G-SW16LC-6C2ER switch line card
Photograph of a 10G-SW16LC-6C2ER switch line card

The photograph above shows one of these line cards, which have two 10GBase-R (-SR or -LR depending on the XFP pluggable fiber transceiver) Ethernet-protocol ports, and six 10GBase-CX4 Myrinet-protocol ports. The chip under the black heat sink is the Myricom 10G_XBar16, which has 8 (XAUI) ports to the backplane and 8 ports to the front panel. Six of the front-panel ports are Myrinet-protocol 10GBase-CX4 ports. Two of the front-panel ports connect from the 10G_XBar16 through two Myricom Lanai-2Z chips, whose firmware performs layer-2 protocol conversion between the Myrinet switch fabric and interoperable Ethernet-protocol ports on the front panel.

The photo to the left is a closeup of part of the Myri-10G switch in the booth cluster. Three 10GBase-SR Ethernet ports connected to the hosts in the other corners of the Myricom booth. The unidirectional netperf TCP/IP data rates observed from the cluster hosts (IPoM) through the IPoM-IPoE protocol conversion to the satellite IPoE hosts was ~9.4 Gbits/s.

Two other Ethernet ports connected with 10GBase-LR ("LAN-PHY") directly to SCinet, and provided the Internet connectivity into the Myricom booth. The 10-Gigabit Ethernet ports from these switch line cards are fully compliant with Ethernet standards, and support Ethernet link aggregation.

Line cards such as these are being supplied to the Netherlands DAS-3 project to build a very high performance grid of clusters. Myricom licensed from Linux Magazine a reprint of an excellent article by Douglas Eadline, "The Wide Area Cluster" (pdf, 1.1MB), which describes the DAS-3 project in detail.

The SC06 Bandwidth Challenge. The SC06 Bandwidth Challenge (BWC) focused on disk-to-disk performance using a single 10-Gigabit/s path. Rather than partially filling numerous links, as was done in the SC05 BWC, the organizers wanted to see if entrants could nearly completely utilize a single path. Eight entrants were selected.

The overall winner was NCDM (National Center for Data Mining at the University of Illinois at Chicago). Caltech and Indiana University were in 2nd and 3rd places with honorable mentions. Although the contestants focused on different applications, ranging from mining the Sloan Digital Sky Survey to analyzing data from the Large Hadron Collider to accessing Indiana University's Data Capacitor, all three entries had in common that they used Myri-10G 10-Gigabit Ethernet NICs and software as part of their winning efforts.

The November-2006 TOP500 list. As part of the normal turnover in the TOP500 list, the number of Myrinet clusters declined slightly from 87 in the June-2006 list to 79 in the November-2006 list. These are all clusters using Myrinet-2000, the generation of Myrinet products that preceded Myri-10G. We expect to see the first Myri-10G clusters appear in the June-2007 TOP500 list.

Since the June-2006 TOP500 list, the MareNostrum blade cluster at the Barcelona Supercomputer Center was upgraded from IBM JS-20 to JS-21 blades. This 2560-host, 10,240-processor, Myrinet-2000 cluster achieved rank #5 in the November-2006 TOP500 list with a Rmax performance of 62,630 Gigaflops. MareNostrum is not only the highest ranked TOP500 system in Europe, but, according to the TOP500 architecture classification, is the fastest cluster in the world.

Tampa was pleasant during SC06. For readers who are not familiar with the SC conferences, this venue includes not only a technical conference and exposition, but is a convention of HPC people. Myricom technical and sales people saw many old friends and made some new friends both on the exposition floor and at numerous off-site visits with our customers and partners.

Photo of the Myricom booth at SC05
The Myricom SC06 booth

The Exhibit Team. The Myricom team attending and exhibiting at SC06 were Scott Atchley, Member Technical Staff; Susan Blackford, Member Technical Staff; Bob Brown, Microsoft Business Development; John Daley, Senior Programmer; Dr. Markus Fischer, Senior Software Architect; Tom Leinberger, Director of Sales - Central Region; Dr. Patrick Geoffray, Senior Software Architect; Dave PeGan, Vice President, Sales; Dr. Loic Prylli, Senior Software Architect; Scott Schweitzer, Director of OEM Business Development; Dr. Chuck Seitz, CEO; Dr. Ruth Sivilotti, Member of the Technical Staff; Marty Stewart, Executive Assistant; and Tim Sticklinski, Director of Sales - Western Region.

We're now looking forward to ISC07, to be held 27-29 June 2007 in Dresden, Germany, and to SC07, to be held in 10-16 November 2007 in Reno, Nevada.

Myricom banner
21 November 2006, later updates expected