|
and TFT Color Display of the Myrinet-2000 Switches for Large Clusters |
Home | Introduction | Web Interface | TFT Display | Appendix | Glossary
The following maintenance procedures assume a basic knowledge of the features of the Web Interface and the TFT Color Display for the Myrinet-2000 Switches for Large Clusters.
How do I configure the monitoring line card (M3-MONITOR)?
Detailed instructions can be found in the following FAQ entry "How do I configure the monitoring line card (M3-MONITOR) in the M3-CLOS-ENCL/M3-CLOS-ENCL-B and/or M3-SPINE-ENCL/M3-SPINE-ENCL-B switch(es)?".
Can I assign a static IP address to the monitoring line card (M3-MONITOR)?
Yes, as of switch firmware v0.9.8.8 and later, you can assign a static IP address to the monitoring line card. Follow the instructions in Upgrade Monitoring Card Firmware, and then assign the static IP address as instructed in the FAQ entry "How do I configure the monitoring line card (M3-MONITOR) in the M3-CLOS-ENCL/M3-CLOS-ENCL-B and/or M3-SPINE-ENCL/M3-SPINE-ENCL-B switch(es)?".
How do I upgrade the firmware on the monitoring line card (M3-MONITOR)?
Detailed instructions can be found in the section entitled Upgrade Monitoring Card Firmware.
Also refer to How do we tell if the switch firmware is up-to-date?
How do we tell if the switch firmware is up-to-date?
The following procedure will determine if the switch firmware on the monitoring line card and the switch line cards is up-to-date.
firmware: v0.9.9.3 Jan 23 2007 13:34:49
Check the latest version of shipfs.img on the ftp site
If the switch firmware on the monitoring line card is not up-to-date, upgrade the monitoring line card firmware as detailed in the instructions in Update Monitoring Card Firmware.
Another way to determine the versions of the line card firmware on all switch line cards is to grep the web output.
infusion% lynx -source http://10.0.0.2/cgi/web.cgi\?all | grep firmware | grep Compiled firmware version: M3 Switch Firmware v3.0 Type: LCMONITOR Compiled on: Oct 7 2004 18:15:34 firmware version: M3 Switch Firmware v3.0 Type: LC16FX Compiled on: Nov 2 2004 13:49:55
Can I store/save the switch control settings on reboot?
Yes, as of switch firmware v0.9.8.8 and later, it is possible to store (save) switch control settings after a reboot. Refer to the FAQ entry How do I store / save switch control settings on M3-CLOS-ENCL/M3-CLOS-ENCL-B or M3-SPINE-ENCL/M3-SPINE-ENCL-B switches?.
How do I cable the inter-switch connections in a 512-node configuration with the Clos256 Switches?
Detailed instructions can be found in the FAQ entry How do I cable the inter-switch connections in a 512-node configuration with the Clos256 Switches?.
Can you provide examples of how to use the SNMP interface to the Clos256 Switches?
Yes. Detailed examples can be found in the FAQ entry Does the monitoring line card in the M3-CLOS-ENCL/M3-CLOS-ENCL-B and M3-SPINE-ENCL/M3-SPINE-ENCL-B switches support SNMP?.
One of the connected switch ports is not illuminated in green/yellow on the TFT color display.
Is the disconnected switch port (i.e., its virtual port is not illuminated on the TFT display) an sff port or a quad port?
If the disconnected switch port is an sff port follow these diagnostic procedures.
Make sure that the switch firmware (GM-2 or MX) has been installed properly on the host connected to that switch port. When the host software module has been loaded, the green LED on the host interface should be illuminated, and the TFT box associated with this switch port should be green. Check the kernel log (/var/log/messages) on that host for any errors that may have been encountered during the GM/MX installation process.
If the host software has been installed properly and the switch port TFT box is still not illuminated in green, try disconnecting and reconnecting the fiber cable (at both ends). Does the switch port TFT box illuminate green?
If the TFT box does not illuminate green, try a different fiber cable to see if the failure could be due to a damaged fiber cable. Does the TFT box illuminate green if you use a different fiber cable?
If changing the cable does not result in the switch port TFT box illuminating as green, then try connecting the cable to another switch port on a switch line card. Does the green LED illuminate if the cable is connected to a different switch port? If yes, then it sounds like a hardware problem with the original sff port. Examine the switch counters data for that switch port for further information. Let's say that the disconnected sff port Y is located on the switch line card in slot X. Go to the web interface (Status - > Slot X - > sff port Y), and look for non-zero switch counters (e.g., signal lost or port down).
If you have a fiber loopback plug M3F-L (a photo is linked from this FAQ entry), then you can use this plug to rule out any problems at the switch port. After plugging the connector into the sff port, you should see the value of signal lost go to 0 on the TFT display and/or the web output for that port, and the sff port should be displayed as green on the TFT display. If this does not occur, then there is a hardware problem with this switch port.
If you have determined that the sff port is at fault, try to re-sync the xbar port. Go to the web interface and reboot slot X. (Status - > Slot X - > shutdown - > check the box and click Apply).
You could also check if this disconnected port is listed as unsynchronized on the Sync Ports page in the web interface, and if so, click on Sync. This action will resynchronize all ports that are disconnected.
If the port continues to be down after five attempts at shutdown of the switch line card, or Synch, please notify help@myri.com.
If the switch port TFT box is still not illuminated in green, then it sounds like the problem is with the host interface. If you run mx_info or gm_board_info on this host, does this host see the other hosts in the network? I.e., does the routing table information contain routes to all of the other hosts in the network? If not, refer to this MX FAQ entry or GM FAQ entry for details. Refer to this FAQ entry for details on running the mx_pingpong hardware loopback test or gm_allsize hardware loopback test.
If the disconnected switch port is a quad port follow these diagnostic procedures.
Is the switch fully-populated with hosts? If the switch is not fully populated, it is possible that the non-illuminated quad port is normal since the quad port is not connected to any hosts. Examine the web interface output for this quad port to determine if it is connected to a xbar or not through the backplane. The (xbar port) field will be a hyperlink to a specific xbar port if it is connected to a xbar (and hosts). The (xbar port) field will not be a hyperlink to a specific xbar port if it is not connected to a xbar (and hosts).
If the switch port at one end of the cable shows that it is connected (because its virtual port is illuminated on the TFT display), and the switch port on the other end of the cable does not show that it is connected, please contact help@myri.com, as this should never happen.
Sometimes just re-seating the cable will bring the link up. Try this on both ends of the connection.
Note that the proper procedures for disconnecting/reconnecting a quad port cable are detailed in the following FAQ entry.
The next step is to determine if the problem is the fiber cable itself, or the switch port at either end of the cable, which we will refer to as A and B.
We will start with switch A. Let's call the quad port to which the fiber cable is connected, PA1. Find a nearby quad port which is in the connected state (we'll call this one PA2). Swap the fiber cable in PA1 with the fiber cable in PA2.
If PA1 shows disconnected and PA2 shows connected, then we know there is a problem with PA1 and the fiber cable and other end of the connection are fine. If PA1 shows connected and PA2 shows disconnected, then PA1 is fine and the problem is either with the fiber cable or the remote quad port. If both PA1 and PA2 show connected, the fiber cable probably just needed reseating. If neither show connected, try reseating, and if that does not work, there may be a problem on both ends.
Swap PA1 and PA2 back to their original cabling connection.
If the problem was not shown to be with PA1, move on to switch B and repeat the process of port swapping with PB1 and PB2.
If the problem moves with the cable again, then replace the cable and you should be done.
If the problem stays with PB1, then we need to diagnose the port.
Swap PB1 and PB2 back to their original cabling connection.
If you have determined that one end of the cable or the other is at fault, try to re-sync the xbar port. For example, let's say that the disconnected port is on a quad xbar card in slot X, then go to the web interface and reboot slot X. (Status - > Slot X - > shutdown - > check-and-Apply).
You could also verify that this disconnected port is listed on the Sync Ports page in the web interface, and click on Sync. This action will resync all ports that are disconnected.
If the port continues to be down after five attempts at shutdown of the switch line card, or Synch, please notify help@myri.com.
If none of the connected switch ports on the same virtual switch line card (same column) are illuminated in green (or yellow) on the TFT display, then it sounds like there is a switch firmware configuration issue. Are you sure that the M3-4SW32-16Q and M3-THRU-16Q are connected in the proper slots in the switch chassis? For further details, refer to the Specifications Index for Myrinet-2000 Switches for Large Clusters, as well as this FAQ entry.
If there is a configuration error, you should see a non-zero value for firmware faults listed in the web output (Status - > Slot X) for that switch line card.
The image on the TFT display is white/blank. How do I fix this?
There are two possible solutions for this problem.
Reseat the monitoring line card. This will power-cycle/reboot the monitoring card.
If reseating the monitoring line card does not resolve the problem, power-cycle the switch enclosure.
Refer to the FAQ entry How should the cables between a M3-SPINE-ENCL/M3-SPINE-ENCL-B enclosure and M3-CLOS-ENCL/M3-CLOS-ENCL-B enclosures be connected?.
Why doesn't the mapper show the nodes across the M3-SPINE-ENCL/M3-SPINE-ENCL-B in my cluster?
Refer to the FAQ entry Why doesn't the mapper show the nodes across the M3-SPINE-ENCL/M3-SPINE-ENCL-B in my cluster?.
How do I cable a 768-node cluster?
Refer to the FAQ entry How do I cable a 768-node cluster?.
What are the guidelines for inter-switch cabling of a 1024-node cluster?
Refer to the FAQ entry What are the guidelines for inter-switch cabling of a 1024-node cluster?.
![]()
Last updated: 27 August 2007