Product Announcement
1 May 2003

PCI-X Interfaces and New Myrinet Software


M3F-PCIXD-2   Myrinet-Fiber/PCI-X Interface

The beginning of a major advance in Myrinet interfaces and software: The M3F-PCIXD-2 interface pictured above is the first in a new series of Myrinet/PCI-X interfaces. The PCI implementation operates with 64-bit, 133MHz, PCI-X buses and also with 64-bit, 66MHz, PCI buses. More for less: This "low profile" PCI short card, based on the Myricom Lanai-XP chip, is faster than the previous fastest Myrinet interface, the M3F-PCI64C-2, which has until now been priced at $1,295. However, the M3F-PCIXD-2 interface is being introduced at a list price of $995.

The PCIX-series interfaces were designed to work efficiently with a new generation of Myrinet software. Myricom will supply both GM-2 and "Myrinet Express" (MX) software support for the PCIX-series of interfaces.

Performance of the M3F-PCIXD-2 interface: When used in hosts with PCI-X slots and with either GM 2 or MX, the M3F-PCIXD-2 interfaces exhibit summed-bidirectional data rates closely approaching the 500 MB/s (250+250 MB/s) Myrinet link speed. The short-message latency under the first GM-2 release is ~6.3µs. The short-message latency under the current developmental version of MX is less than 4µs. GM 2 provides low-overhead ethernet emulation (TCP/IP and UDP/IP over Myrinet) at nearly link speed. MX ethernet-emulation performance is expected to be comparable to GM 2. The excellent PCI-X DMA performance of the M3F-PCIXD-2 interfaces in a variety of PCI-X hosts is listed in the table near the bottom of this page.

Changes in Myricom's Myrinet-Interface Lineup

With this introduction of the M3F-PCIXD-2 interface, the list price of the M3F-PCI64C-2 interface has been reduced from $1,295 to $995. The M3F-PCI64C-2 will continue to be offered for legacy applications that require a universal PCI interface (5V or 3.3V, 33MHz or 66MHz, 32-bit or 64-bit) or GM 1 software. The M3F-PCI64B-2 interface is now removed from the product list, but will continue to be available on special order or for replacements.

Interface Software Support List Price
M3F-PCIXD-2   ("D card")
64-bit, 133MHz PCI-X
225MHz RISC & memory
GM 2 &
MX (3Q03)
$995
M3F-PCI64C-2   ("C card")
64-bit, 66MHz PCI
200MHz RISC & memory
GM 1 & GM 2 $995
M3F-PCI64B-2   ("B card")
64-bit, 66MHz PCI
133MHz RISC & memory
GM 1 & GM 2 Now available only
on special order

This summer Myricom will introduce a "high-end" interface in the PCIX series, the M3F2-PCIXE-2 ("E card"), which will have two 250+250 MB/s Myrinet-Fiber ports and a 333MHz clock rate for the RISC and memory. This product will be based on the Lanai-2XP chip. The interface firmware, both GM-2 and MX, is able to aggregate packet traffic across the two ports, such that the two ports act as a 500+500 MB/s port.

Detailed Technical and Performance Information

An Enhanced Architecture for Myrinet Interfaces

Myricom has held the architecture of Myrinet interfaces stable for four-year periods, with software support spanning two architectures:

The PCIX series of interfaces employ an enhanced architecture that allows these interfaces to achieve higher performance than the PCI64 series. Relative to the PCI64C interfaces, the increase in the clock rate of the RISC and memory in the PCIXD interfaces from 200MHz to 225MHz, or in the PCIXE interfaces to 333MHz, is only part of the story. The Lanai-X (Lanai-10) chips on which the PCIX-series interfaces are based include, in addition to the RISC processor, DMA engines for sending and receiving packets on each port, a DMA engine for the fully integrated PCI/PCI-X bus interface, and a copy/CRC32 engine. The RISC processor is the same as that in the Lanai-9 chips, but with the addition of a dispatch instruction that speeds up event dispatch in the GM-2 or MX firmware. The Lanai X is a remarkable throughput engine, and has a lot more total processing power than earlier Lanai chips.

For the M3F-PCIXD-2 interfaces, this processing power taps into an 8-byte-wide (with byte parity) local memory clocked at 225MHz, for a total memory data rate of 1.8 GB/s, enough to support the peak 1.067 GB/s PCI-X data rate plus the peak 0.5 GB/s link data rate, with all the memory cycles not used by the PCI and packet engines available for the copy engine and RISC. For the future M3F2-PCIXE-2 interfaces, their 2.67 GB/s memory data rate supplies 1.067 GB/s peak for PCI-X plus 1GB/s peak for the two links, with all of the rest of the memory cycles for the copy engine and RISC.

The Lanai-X series of chips and the PCIX series of interfaces also have several new features, the most important of which is that the interfaces are self-initializing from EEPROM. This capability is used to simplify initialization, and with later EEPROM updates can be used to provide diskless booting through the Myrinet. (The EEPROM data can be upgraded in interfaces installed in host computers.)

In terms of technology, the Lanai-X chip allows a more highly integrated interface than the PCI64-series interfaces. The entire M3F-PCIXD-2 interface consists of the Lanai-XP chip, its associated fast memory, the EEPROM, a serializer-deserializer (SerDes) chip for the port, and the fiber transceiver. The small circuit-board area allows the M3F-PCIXD-2 interface to fit on a low-profile PCI short card, and other versions of these interfaces to be used in blade systems. Myricom has available low-profile PCI face plates that allow the M3F-PCIXD-2 interfaces to be used in 2U servers without the need for riser cards (see the photo to the right).

GM 2 ­ An Enhanced Message-Passing System

GM 2 is an evolution of the GM 1 message-passing system, which is used today in thousands of clusters. Certain features of developmental versions of GM 2, such as the shared GM library, were "back ported" to GM 1.6; thus, you can expect an easy transition from GM 1.6.x to GM 2.0. If you use GM through a middleware layer such as MPICH-GM, you will see few if any operational differences between GM 1 and GM 2.

The principal, internal, structural difference between GM 1 and GM 2 is their buffer management. GM 2 requires more flexible buffer management in order to support multi-path, dispersive routing. Another major structural difference between GM 1 and GM 2 is their mapping software. GM 2 performs "active" mapping and route computation on all nodes, a capability that was required for the new switch line cards with Gigabit-Ethernet ports, and that improves network fault tolerance by updating routes even during a computation. GM 2 also introduces several improvements in ethernet emulation (TCP/IP and UDP/IP over Myrinet), including interrupt coalescing.

The GM-2 application-programming interface (API) differs from that of GM 1 in two important respects. First, the semantics and format of GM node IDs were changed for efficiency and to accommodate the new mapping techniques. Second, GM 2 adds a gm_get function (remote DMA read, or RDMA read) to go along with the gm_put function (RDMA write). These changes in the GM API required small changes in the middleware layers, MPICH-GM, VI-GM, Sockets-GM, and PVM-GM. Thus, different binary versions of these middleware packages are required for GM 1 and GM 2.

GM 2 is not compatible with GM 1. The price of certain of the improvements in GM 2 was changes in packet formats, including the format of mapping and ethernet-emulation packets. Thus, a Myrinet should use all GM-1 hosts or all GM-2 hosts. As noted above, GM 2 is available both for the PCIX series of interfaces and for the PCI64 series, thus allowing clusters with both types of interfaces. (GM 2 and MX use the same formats for mapping and ethernet-emulation packets, thus allowing clusters with both GM-2 and MX hosts, and ethernet-emulation communication between GM-2 and MX hosts.)

Performance of the PCIX-series PCI-bus implementation

The theoretical limit of a 64-bit, 133.3MHz PCI-X bus is 1067 MB/s, either reading or writing. The M3F-PCIXD-2 interface achieves this data rate within 4KB bursts (the maximal DMA-transfer size for PCI-X), and performs all PCI-X bus protocols in a minimum of bus clock cycles. The PCI-X slots in host computers transfer data to and from system memory, and thus can only approach the theoretical limits. The following table provides measurements of the PCI-DMA performance, as shown by the GM-2 "gm_debug" utility, of a sample of today's best cluster hosts.

Host/OS bus read (send) bus write (recv)
AMD "Melody" dual 1.6GHz Opteron server (AMD 8131 chip set) / SuSE 8 Linux 919 MB/s 780 MB/s
HP "Marvel" (Alpha EV-7, es47) quad-Alpha server / either Linux or Tru64 908 MB/s 1038 MB/s
HP rx2000 dual 900MHz Itanium2 (HP chip set) / Linux 784 MB/s 1044 MB/s
Intel quad 900MHz Itanium2 (Intel 870 chip set) / Linux 819 MB/s 947 MB/s
Intel dual 2.4GHz Xeon whitebox (Serverworks GC chipset, 400MB/s FSB) / Linux 856 MB/s 1024 MB/s
Intel dual 1.8GHz Xeon whitebox (Intel E7500 chipset, 400MB/s FSB) / Linux 816 MB/s 853 MB/s
Newisys dual 1.4GHz Opteron server (AMD 8131 chip set) / SuSE 8 Linux 929 MB/s 1032 MB/s
Supermicro X5DL8-GG dual 2.4GHz Xeon (Serverworks GC-LE chip set, 533MB/s FSB) / Linux 932 MB/s 1044 MB/s
Supermicro X5DPE-G2 dual 2.4GHz Xeon (Intel E7501 chip set, 533MB/s FSB) / Linux 826 MB/s 853 MB/s
Tyan Trinity single 3.06GHz Pentium-4 (Serverworks GC-SL chip set, 533 MB/s FSB) / Linux ­ performance in the 133MHz PCI-X slot 859 MB/s 1040 MB/s
Tyan Trinity single 3.06GHz Pentium-4 (Serverworks GC-SL chip set, 533 MB/s FSB) / Linux ­ performance in the 100MHz PCI-X slot 708 MB/s 782 MB/s

Note: Small differences are not significant. With the M3F-PCIXD-2 interfaces, 500 MB/s PCI-DMA performance is sufficient to achieve maximal summed-bidirection performance on the Myrinet port. All of the hosts listed above have PCI-DMA performance to spare.

Documentation & Availability

Specifications for the M3F-PCIXD-2 interface can be found here. These new interfaces are in production. Orders will be filled in the order received.

The GM-2 alpha snapshots on the Myricom Software and Customer Support page are part of a project in which about sixty prototype M3F-PCIXD-2 interfaces have been sent to customers for testing. A regular GM 2.0 release for Linux with MPICH-GM middleware is in final testing at Myricom, and will be available on the web by 15 May. Releases for other operating systems, and updating of other middleware to run with GM 2, will become available over the subsequent month.

Documentation for the MX software is expected to be available on the web in June, with release scheduled for 3Q03.


1 May 2003