************************************************************************ * Myricom GM networking software and documentation * * Copyright (c) 2005 by Myricom, Inc. * * All rights reserved. See the file `COPYING' for copyright notice. * ************************************************************************ README-linux for gm-2.0 and gm-2.1 README for linux distribution Supported OS/processors: Linux 2.6 for IA32, IA64, AMD64, EM64T, PowerPC, PowerPC64, Power4, and Power5. Linux 2.4 for IA32, IA64, AMD64, EM64T, Alpha, PowerPC, and PowerPC64. - For Alphas, if you have 2 GB or more of memory, we recommend kernel version 2.4.18 or later to install GM. You must use kernel version 2.4.14 or later (2.4.9 also works). Supported NICs: PCI64, PCI64A, PCI64B, PCI64C, PCIXD, PCIXE, PCIXF If you have PCIXD or PCIXF NICs, you should use GM-2.0.x. If you have PCIXE NICs, you must use GM-2.1.x. Note: gm-2.0 does not interoperate with gm-2.1.x or gm-1.x. A mixture of hosts with gm-1.x and gm-2.0 or gm-2.1 cannot talk to each other. If you have PCI32{A,B,C} NICs, you will need to upgrade your NIC, or use a previous version of GM (gm-1.2.3 for PCI32A and gm-1.5.2.1 for PCI32B and PCI32C). GM-2 does not support PCI32-based NICs. For installation instructions of an earlier GM version please refer to the respective README and README- files. WARNING: When building/linking GM applications, you must do so on a linux box that matches the OS version of the machine on which you will be running. You cannot compile on a 2.4.x machine and run the executable on a 2.6.x machine. Table of Contents: ----------------- I. GM Installation a. Configuring and compiling GM b. Installing the GM driver c. Enabling IP over Myrinet (Ethernet emulation) d. Testing the GM installation II. Verifying the GM performance III. Fork() Support IV. Sample Scripts to automatically load GM and start the Mapper V. Caveats a. Additional installation instructions for PowerPC64 b. Additional installation instructions for AMD64 and EM64T c. Using Compaq Compilers for Alpha Linux (ccc cxx) d. PCI Chipset Tweaks e. APIC IRQ conflict on Tyan and AMD motherboards f. AGP (nVidia and ATI) conflicts g. SuSE 8.1 system compiler VI. Miscellaneous a. Uninstallation of the GM driver ************************************************************************ If difficulties are encountered, please consult the FAQ http://www.myri.com/scs/FAQ/ and all technical support questions should be directed to help@myri.com. ************************************************************************ =================== I. GM Installation =================== GM installation is performed in the following four steps. (Note: These instructions assume that you have PCIXD or PCIXF NICs. If you have PCIXE NICs, you must use gm-2.1.9_Linux.tar.gz.) 1. Configuring and compiling GM: --------------------------------------------- gunzip -c gm-2.0.19_Linux.tar.gz | tar xvf - cd {GM_HOME} ./configure make By default, we assume that the header files required to build external modules for your Linux installation are located under: "/lib/modules/`uname -r`/build". If your Linux kernel files are not there you must configure with the following option: ./configure --with-linux= where specifies the directory for the linux kernel source. The kernel header files MUST match the running kernel exactly: not only should they both be from the same version, but they should also contain the same kernel configuration options. Note: GM-2.0 does not interoperate with GM-2.1.x or GM-1.x. For a complete listing of all options to configure, type: ./configure --help Note: Do not use the configure flag --enable-directcopy. This flag is not a valid option to GM 2.0.x or GM 2.1.x. It may be re-enabled in the future for specific Linux distribution releases. 2. Installing the GM driver: --------------------------------------------- Select an installation directory path . It is usually best for to be the path to an NFS directory available on all machines that are to share this GM installation. The directory must be accessible using on all machines that are to share the installation. must be an absolute path; it must start with "/". However, may contain symbolic links. cd binary ./GM_INSTALL If you omit , the driver will be installed in the default directory, /opt/gm/. Next, you must run su root /sbin/gm_install_drivers /etc/init.d/gm start echo /lib >> /etc/ld.so.conf && /sbin/ldconfig on each machine. The "ldconfig" line is optional, and adds the GM library directory to the system library search path. If you do not do this, individual users will have to either manage their LD_LIBRARY_PATH environment variable or link their programs with an "-rpath" option for the dynamic linker to locate the GM shared library. The GM_INSTALL script copies the GM binaries to the specified binary installation directory . The gm_install_drivers script performs the following operations: * Copies gm.o into /lib/modules//gm/gm.o * Removes the previous installation by executing /sbin/gm_uninstall_drivers (rmmod) * Copies other files from the binary installation directory to an architecture-specific directory (/etc/init.d/). * Creates the devices (/dev/gm* and /dev/gmp*), one device per NIC * Creates the mapper's per-host configuration directory (/etc/gm_mapper) and possibly store configuration files there. The gm "start" script performs the following operations: * Loads the GM module (insmod) * Starts a mapper daemon called "gm_mapper" for each Myrinet NIC contained in the machine. The PIDs of the running gm_mappers are stored in /var/run/gm_mapper/pid.{board_id}. The gm "stop" script performs the following operations: * Shuts down the gm_mapper daemon * ifconfig's down the myri* ethernet devices * Unloads the GM module (rmmod) If you are installing GM-2 on a diskless cluster, contact help@myri.com for assistance. Important note: Stopping the mapper while GM is running is not supported. The gm_mapper should be left running at all times, and it will not interfere with the performance of jobs running over Myrinet. Important note: The installation scripts do not configure the IP device. If you wish to run IP over GM/Myrinet (ethernet emulation), you must configure the device. Refer to step 3. If you wish for the driver to auto-load at boot, you can create appropriate links in the /etc/rcN directories to the /etc/init.d/gm and /etc/init.d/myri scripts, or, for example, use the following command (for Debian Linux): update-rc.d gm defaults or (for RedHat Linux): chkconfig --add gm Alternatively, you may start and stop the drivers manually using su root /etc/init.d/gm start /etc/init.d/gm stop or su root /etc/init.d/gm restart to start, stop, or restart the driver, respectively. For directions on how to uninstall the GM driver, refer to the "Miscellaneous" section. Note: If the host is rebooted, you must reload the GM driver. 3. Enabling IP over Myrinet (Ethernet Emulation) (OPTIONAL) ----------------------------------------------------------- If you wish to run IP over Myrinet (ethernet emulation), the Linux command to enable IP over GM is as follows: /sbin/ifconfig myri0 up where you must replace myri0 with the appropriate name (myri1, myri2, etc.) if you have more than one Myrinet NIC per host. Consult the "Running IP" section of the FAQ (http://www.myri.com/scs/FAQ/) for other related questions. 4. Testing the GM Installation ------------------------------ Once the GM software has been properly installed on all of the hosts in your cluster, you are ready to validate your Myrinet installation by performing the following sequence of tests. * Check the LEDs on each switch port and NIC port * Run gm_board_info on one host * Run gm_debug to test the PCI bandwidth * Run gm_allsize to test the links in the network * Run gm_stress to test the network * Run Mute on the cluster to test for bad links * Run watch_switches, wirelist, and link_test to help diagnose connectivity problems within a Myrinet fabric Each of these steps is detailed in the Troubleshooting section of the FAQ http://www.myri.com/scs/FAQ/ The test scripts (gm_board_info, gm_debug, gm_allsize, gm_stress) are available in /bin in your GM installation. A README describing each of these tests can be found in /bin/README. The diagnostic tools, Mute and link_test, are not included in the GM distribution, but can be downloaded from http://www.myri.com/scs/ ================================ II. Verifying the GM Performance ================================ We recommend the following test to verify your GM performance. cd /bin/ gm_debug -L This gm_debug test displays the results of the hardware benchmark test of the PCI bus with the DMA engine of the Myrinet NIC. The output of this command indicates the maximum sustained bandwidth that can be obtained from the PCI bus, and thus provides an upper bound on GM performance. A detailed description of this benchmark can be found in the FAQ entry "Can you describe in detail the "hardware benchmark of the PCI bus" that is returned by gm_debug?" The output of this command also tells you if the Myrinet NIC was correctly detected as 64-bit / 133 MHz or 64-bit / 66 MHz, for example. If the NIC was not correctly detected by the BIOS, you should suspect a riser card problem or a PCI slot problem. Performance graphs (http://www.myri.com/myrinet/performance) for GM are available. The performance measurements were obtained by running gm_allsize tests for latency and bandwidth as described in the FAQ entry ("What are the run-time options to gm_allsize?"). Refer to the section entitled "GM Performance" in the /README for complete details on expected GM performance. ==================== III. Fork() Support ==================== As of gm-1.5.2 and later, GM has full support for fork() under Linux. It works for all processor families. There are no restrictions; GM can fork() with or without a GM port open. However, if the customer has a choice between using vfork() or fork(), there will be better performance with vfork() since the time to fork a process with vfork() is much shorter. ================================================================ IV. Sample Scripts to automatically load GM and start the Mapper ================================================================ The directory {GM_HOME}/drivers/linux/scripts contains some sample initialization scripts, contributed by customers, that can be customized to suit your system to automatically load the gm driver and start the GM Mapper. =========== V. Caveats =========== ------------------------------------------------------ a. Additional installation instructions for PowerPC64 ------------------------------------------------------ Refer to the FAQ entry "How do I build GM-2 on PowerPC64?" (http://www.myri.com/cgi-bin/fom?file=260) for additional installation instructions required for PowerPC64 processors. If you're using SuSE 9.0 or later, you might also want to refer to the Myrinet FAQ entry "I'm using SuSE Linux but gm_install_drivers complains of a running/source kernel mismatch. What's wrong?" (http://www.myri.com/cgi-bin/fom?file=272). ------------------------------------------------------------ b. Additional installation instructions for AMD64 and EM64T ------------------------------------------------------------ Refer to the FAQ entry "How do I build GM-2 on AMD64?" (http://www.myri.com/cgi-bin/fom?file=252) for additional installation instructions required for AMD64 and EM64T processors. If you're using SuSE 9.0 or later, you might also want to refer to the Myrinet FAQ entry "I'm using SuSE Linux but gm_install_drivers complains of a running/source kernel mismatch. What's wrong?" (http://www.myri.com/cgi-bin/fom?file=272). --------------------------------------------------- c. Using Compaq Compilers for Alpha Linux (ccc cxx) --------------------------------------------------- Under the C shell: setenv CC ccc setenv CXX cxx setenv CXXFLAGS \ "-g -O2 -inline speed -x cxx -noexceptions -nocxxstd -using_std -w2" setenv CFLAGS -gcc_messages setenv KCC gcc rm -f config.cache ./configure or under a Bourne shell or Bash: CC=ccc ; export CC CXX=cxx ; export CXX CXXFLAGS="-g -O2 -inline speed -x cxx -noexceptions -nocxxstd" CXXFLAGS="$(CXXFLAGS) -using_std -w2" ; export CXXFLAGS CFLAGS=-gcc_messages ; export CFLAGS KCC=gcc ; export KCC rm -f config.cache ./configure ---------------------- d. PCI Chipset Tweaks ---------------------- In the file: {GM_HOME}/drivers/linux/gm/gm_arch.c If you have an i840 chipset, modify the flag to be #define GM_INTEL_840 1 There are similar defines for: #define GM_INTEL_860 1 #define GM_21154 1 #define GM_INTEL_450NX 1 #define GM_KT266A 1 Also from this file, please read this warning: /****************** PCI CHIPSET TWEAKS: WARNING ************************* * * * The patches below were supplied by customers who reported that * * their PCI performance was improved when using these patches * * on a particular chipset. * * These patches tweak certain bits in the chipset and have not been * * verified or reviewed by Myricom and may have other, possibly * * negative, side-effects. Before applying one of these patches, * * you may wish to check for a newer BIOS for your machine. * * Also, a newer linux kernel may provide better PCI performance, * * and might be a safer course of action than applying one of * * these patches. * * * * Use these patches at your own risk. * * * ***********************************************************************/ -------------------------------------------------- e. APIC IRQ conflict on Tyan and AMD motherboards -------------------------------------------------- We have encountered APIC IRQ conflicts on several Tyan and AMD motherboards. The installation of GM will fail with an error message similar to the following: GM: LANai rate set to 198 MHz (max=2-2MHz) GM: Board 0 page hash cache has 32768 GM: Allocated IRQ 11 GM: NOTICE: GM: board interrupt (configured on IRQ 11) is not working GM: NOTICE: GM: Failed to initialize Myrinet Card GM: gm: driver unloading GM: WARNING: GM: No Board Initialized ############################# Error Installing GM driver module ############################# or GM: Version 1.5.2.1_Linux build 1.5.2.1_Linux xxxh@xxx.xx.xx Fri Jul 19 14:03:17 EDT 2002 GM: NOTICE: GM: Module not compiled from a real kernel build source tree GM: This build might not be supported. GM: Highmem memory configuration: GM: PAGE_ZERO=0x0, HIGH_MEM=0x3ff80000, KERNEL_HIGH_MEM=0x38000000 GM: Memory available for registration: 224748 pages (877 MBytes) GM: MCP for unit 0: L9 4K (new features) GM: LANai rate set to 133 MHz (max = 134 MHz) GM: Board 0 page hash cache has 32768 bins. GM: Allocated IRQ5 GM: NOTICE: GM: Board interrupt (configured on IRQ 5) is not working. GM: NOTICE: GM: Failed to initialize Myrinet Card GM: gm: driver unloading The IRQ error message says that the driver asked the Myrinet NIC to raise the interrupt that has been assigned by the BIOS to check that it's working, and the driver doesn't receive it in the expected timeout. Thus, the driver cannot use the Myrinet board and exits from the initialization. The most frequent cause for this problem is: * The interrupt lines are managed by an APIC (Advanced Programmable Interrupt Controller) chipset and it is not supported correctly by the BIOS and/or by the current Linux kernel. Possible solutions: 1. Try a different PCI slot. 2. Upgrade the BIOS. 3. Upgrade the Linux kernel version if available. Boot the Linux kernel without APIC support; pass the flag -noapic to the booting kernel via the LILO boot prompt. In this case, the kernel will use a safer compatibility mode. It is important to note that if this error occurs on any node in the cluster, all nodes in the cluster should be booted with -noapic. Refer to the Myrinet FAQ entry "GM Installation fails. What does this error message mean?" (http://www.myri.com/cgi-bin/fom?file=46) for further details. --------------------------------- f. AGP (nVidia and ATI) conflicts --------------------------------- Two types of problems were reported. 1. If I load the GM module first, and then load the nVidia or ATI module, it works. But if I load the nVidia or ATI module first, GM won't load. Or, neither the GM module nor the nVidia module will load if both NICs are installed in the host. This problem is due to a shortage of virtual memory (used for IO-mapping PCI memory) in the Linux kernel. On configurations with a lot of physical memory, there will only be 128Mb of the address space that Linux will always reserve for virtual memory dynamically allocated. Unfortunately the nVidia card seems to eat as much virtual memory as it can (it occupies at least 128Mb in PCI memory space), so if you load it before the gm module on such a configuration, you will have the error reported. SuSE and possibly others distributions have added a "vm_reserve=" kernel command-line option that permits you to tune at boot-time the amount of available free virtual space the kernel should leave to modules. For those kernels, adding a "vm_reserve=256m" (in some configurations, e.g., more than one nVidia NIC, more than 256m might be needed) to the grub/lilo/... bootloader configuration will solve the problem. Refer to /Documentation/kernel-parameters.txt to see if the vm_reserve= option is available. This option can also be added to other kernels with the patch given in http://www.ussg.iu.edu/hypermail/linux/kernel/0409.1/2524.html. Alternatively, if this official patch does not apply to your Linux kernel, we offer the following patch for people using more than 768MB of memory and an nVidia or ATI card. If you're using an early Linux 2.4 kernel, you will need this patch: --- arch/i386/kernel/setup.c Thu Aug 2 17:00:46 2001 +++ arch/i386/kernel/setup.c.2 Thu Oct 11 09:00:59 2001 @@-815,7 +815,7 @@ /* * 128MB for vmalloc and initrd */ -#define VMALLOC_RESERVE (unsigned long)(128 << 20) +#define VMALLOC_RESERVE (unsigned long)(256 << 20) #define MAXMEM (unsigned long)(-PAGE_OFFSET-VMALLOC_RESERVE) #define MAXMEM_PFN PFN_DOWN(MAXMEM) #define MAX_NONPAE_PFN (1 << 20) For a Linux-2.6 kernel or recent Linux 2.4 kernels, the VMALLOC_RESERVE parameter has changed location, so you would need to use the patch below to change the VMALLOC_RESERVE constant which is now in page.h, to reserve 256mb of virtual space for driver rather than the default 128mb. --- linux-2.6/include/asm-i386/page.h 2004-09-16 16:25:45.000000000 -0400 +++ linux-2.6/include/asm-i386/page.h.new 2004-10-18 13:27:23.000000000 -0400 @@ -98,7 +98,7 @@ * This much address space is reserved for vmalloc() and iomap() * as well as fixmap mappings. */ -#define __VMALLOC_RESERVE (128 << 20) +#define __VMALLOC_RESERVE (256 << 20) #ifndef __ASSEMBLY__ If you have 2 nVidia cards, you might even need to move the 256 to 384 depending on how much virtual memory space the nVidia driver requires. Also be sure that the HIGHMEM option is enabled while configuring the kernel. If you do not mind losing memory or just to do a test, you can try to boot your current kernel with mem=768m to see if the problem disappears. Refer to the Myrinet FAQ entry "GM_INSTALL or gm_install_drivers fails. What does this error message mean?" for further details. 2. Overlapping of prefetch memory for the AGP and PCI bridges. SGI Visual Workstation 550 machine. AGP cards (nVidia Quadro, ATI Mach64 PCI graphics card, ATI Rage AGP). What we see with them is that the prefetchable memory assigned by the BIOS for the AGP and PCI bridges is overlapping. This looks like a BIOS problem and we have asked the customer to look into upgrading the BIOS, or to play with the BIOS settings to attempt to get the BIOS to do the right thing (things to try - toggling the plug-n-play OS setting, change the size of the AGP graphics aperture, reinitialize or re-detect the PCI space in the configuration space, etc.) Specifically, it was seen that: The memory for the Myrinet card is mapped at exactly the same spot with the ATI Mach64 PCI graphics card as it is with the ATI Rage AGP graphics card: 03:01.0 Non-VGA unclassified device: MYRICOM Inc.: Unknown device 8043 (rev 03) Region 0: Memory at 82000000 (64-bit, prefetchable) [size=16M] However, now look at the bridges leading to bus 3 (PCI where Myrinet card is) and bus 1 (AGP) in the ATI Rage AGP config: 00:01.0 PCI bridge: Intel Corporation 82840 840 (Carmel) Chipset AGP Bridge (rev 01) (prog-if 00 [Normal decode]) Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 Prefetchable memory behind bridge: 82300000-850fffff 00:02.0 PCI bridge: Intel Corporation 82840 840 (Carmel) Chipset PCI Bridge (Hub B) (rev 01) (prog-if 00 [Normal decode]) Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 Prefetchable memory behind bridge: 81600000-831fffff See how those the prefetchable memory regions overlap? And, more importantly, see how the bridge to the AGP bus's prefetchable memory region overlaps that of the Myrinet card? Note that the only prefetchable memory on the AGP bus is for the rage card and that this memory is a small subset of the region the bridge is claiming: 01:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage IIC AGP (rev 7a) (prog-if 00 [VGA]) Region 0: Memory at 84000000 (32-bit, prefetchable) [size=16M] This issue is now resolved. You need to download BIOS version A9 --------------------------------- g. SuSE 8.1 system compiler --------------------------------- During internal testing, we noticed a problem with the SuSE 8.1 system compiler (gcc version 3.2 ia32) which resulted in a corrupt DMA address being passed to the firmware. We believe we have worked around this issue, however we still suggest using a different compiler (such as gcc 2.95) to build the GM kernel module. Under the C shell: setenv KCC gcc295 rm -f config.cache ./configure or under a Bourne shell or Bash: KCC=gcc295; export KCC rm -f config.cache ./configure ================= VI. Miscellaneous ================= ----------------------------------- a. Uninstallation of the GM driver ----------------------------------- The gm_install_drivers script generates the script /sbin/gm_uninstall_drivers, which can be used to uninstall the drivers. The GM_INSTALL script generates the script /sbin/GM_UNINSTALL, which can be used to uninstall GM.