
Fabric Management System for Myri-10G and Myrinet-2000 Myrinet Networks
The Fabric Management System (FMS) is a collection of tools and processes used to manage a Myri-10G or Myrinet-2000 Myrinet network. This system relies on a database formed by a collection of files which describe how the network is connected. Since there is a description of how the network is supposed to look, it is possible to report discrepancies between the observed network state and the desired network state. For example, without this description, a failed switch-to-switch link could be routed around, but could not easily be reported as missing, since there would be no way of knowing the link is supposed to be there.
The Fabric Management System is an important diagnostic for verifying the health of the Myrinet hardware. This system also supersedes Mute.
The two primary processes in the Fabric Management System are the fm_server (Fabric Management Server) and the fma (Fabric Management Agent).
fm_server runs on one machine and serves as a focal point for all management activity on the fabric. All errors are reported to fm_server, and are available for viewing through a variety of means.
There is one fma process on each Myrinet node. This process replaces the previous mapper (mx_mapper or gm_mapper) and expands its functionality. Any errors noticed by the fma process are reported to the fms and are then made available to system operators.
There are two ways to use FMS:
FMS relies on a fabric description database, consisting of a collection of text files as described below.
FMS Database
The FMS database consists of several files in a directory, all of which are easily human-readable and editable. These files are read and written by the fm_server process. The current file list is:
which are read and written by the fm_server process, and are located in the directory $FMS_RUN/database. The FMS server process must have write access to this directory, whereas the FMA agents only require read access privileges. By default, the environment variable FMS_RUN is set to /var/run/fms/, but can be overridden by the customer.
When error conditions are detected in the fabric, alerts are generated by the fm_server process. These can be queried on a regular basis, or the fm_server can be configured to proactively report alerts through a user-defined mechanism.
Error Reporting by FMS
Errors detected by or reported to the FMS process are reported to system administrators through "alerts", which are discussed further in Appendix B below. Alerts can be queried remotely via the command fm_show_alerts, via web CGI scripts, via log file monitoring, or the FMS can be configured to run a user-specified command whenever an alert occurs.
The FMS can detect exceptional conditions, either errors or warnings, by monitoring the switch enclosures and through communication with the fma processes running on each node.
Errors Detected Directly By FMS
The FMS process periodically polls the switch enclosures to monitor link status and environmental conditions. The most common problems reported by the switches are down links, noisy links, and overly high operating temperatures.
The FMS monitors the connection status of all fma processes in the system. The absense of an fma connection from a host which is expected to be present generates an alert, as does the loss of connectivity to any fma.
Switch enclosures report the up/down status of each link. The FMS compares this with expected link status and generates an alert if appropriate. If a link transitions from up to down too many times in a given time period, the link is marked as "flaky" and an alert is generated.
The badcrc counters on each link are monitored, and too many badcrcs within a given time period generates an alert.
If the temperature reported by a linecard exceeds a set threshold, an alert is generated. This threshold defaults to a value that is below the shutdown temperature of the linecard, but if higher than should be seen in practice.
The thresholds for all of these alerts can be controlled via the "fm_settings" command.
Errors Detected By FMA
The fma continuously monitors the NICs in a host for several conditions. A CRC error rate which exceeds a set threshold will generate an alert, as will SRAM parity errors in the NIC.
The fabric is continually verified by the fma processes, and any change in fabric topology is reported to the FMS. Depending on the source of the change, this may result in an alert being generated.
The Fabric Management System (FMS) for Myrinet networks may be used with either the MX or GM firmware.
In order to install FMS, the following requirements must be met:
NOTE: FMS is not yet supported on 10G-SW32HSSM-16QP switches.
The FMS distribution is available in the MX distribution (MX-1.1 and later). If you would like to use FMS with GM-2 or GM-1, FMS is also available as a separate FMS tarball. We will soon integrate FMS into the GM-2 distribution.
As of MX-1.1 and later, the FMS is integrated into the MX install package. To build and use FMS with MX, you need to do the following:
Select a host to be the fm_server. The host needs to have IP connectivity to all of the Myrinet compute nodes in the cluster as well as IP connectivity to the switch(es), and does not need to be a Myrinet node. This host also needs to have access to the MX installation directory, <install_path>. Do not choose one of the Myrinet compute nodes to be the fm_server. Ideally, the fm_server server process will run on the headnode of the cluster.
Install and load MX (specifying the --with-fms-server=<fm_server> option at configure time) on the compute nodes, as instructed on the MX-2G Download page or MX-10G Download page.
On the host chosen to be the fm_server, create a writeable directory for maintaining the FMS log and fabric database. The default directory location is /var/log/fms/.
$ mkdir -p /var/run/fms/
Define the Myricom switch enclosures with fm_switch.
$ <install_path>/bin/fm_switch -a <switch_name>
Repeat for each enclosure until all switches have been added.
Start the program fm_server with the flag -d to force the process to run in the background as a daemon.
$ <install_path>/bin/fm_server -d
The fm_server process will print the name of its log file for confirmation.
Note: If the host chosen to be the fm_server does not have read access to the MX installation directory, <install_path>, you will also need to configure and compile (but not load) MX on the FMS server in order to create the fms executable.
The default directory for the FMS database is /var/run/fms/database/.
The default location for the FMS log is /var/run/fms/fms.log.
The default location for the FMA log (on each of the compute nodes) is in /var/run/fms/fma.log.
The default location for the FMS tools is <install_path>/bin/.
$ tar zxf fms.tar.gz $ cd fms $ autoconf $ ./configure $ make $ make install
Note: A patch for FMS is required for interoperability with GM-1. This patch has only been tested with GM-1.6.5. Once the patch is applied, the code will only work on GM-1 (not GM-2 or MX). You should delete your old FMS database before starting up the new fms and fmas.
By default, FMS assumes that MX is the low-level firmware, and that MX is installed in the directory (/opt/mx). (If the GM firmware is used, the default installation directory for GM is /opt/gm). The FMS package is installed in the default directory, /opt/fms.
If you would like to specify a different FMS installation directory, you need to pass the --prefix option to configure. E.g.,
$ ./configure --prefix=<other_dir>
where <other_dir> specifies the alternate directory for installation.
To use GM instead of MX, or to specify an alternate install directory for GM or MX, pass the following option(s) to configure:
$ ./configure --with-myri-api=gm --with-myri-install-dir=<myri_install_dir>
where <myri_install_dir> specifies the binary/library installation directory for MX or GM.
If FMS is installed in the default location, add /opt/fms/bin to your $PATH. E.g.,
$ export PATH=/opt/fms/bin:$PATH
Otherwise, add <other_dir>/bin to your PATH.
Note: If you do not use the default directory, (/opt/fms/), for the FMS installation, you will need to set and export the environment variable FMS_INSTALL. This will allow all of the FMS tools to find the directory automatically.
Create a writeable directory for maintaining the FMS fabric database. The default directory location is /var/run/fms.
$ mkdir -p /var/run/fms
Note: If you do not use the default directory (/var/run/fms) for the FMS fabric database, you will need to set and export the environment variable FMS_RUN, specifying the location for this FMS fabric description directory.
Define the Myricom switch enclosures by using the fm_switch command:
$ fm_switch -a <switch_name>
where <switch_name> is the DNS name or IP address for the monitoring line card within each switch enclosure in the Myrinet fabric. Repeat for each enclosure until all are added. To view a list of the enclosures currently defined, type:
$ fm_switch
If you need to remove a switch from the list, use the -d option:
$ fm_switch -d <switch_name>
Start the fm_server process with the flag -d on the machine to which you installed FMS. This -d flag causes it to run in the background as a daemon. E.g.,
% fm_server -d
The fm_server should have read access to the installation directory FMS_INSTALL and the directory to which MX or GM is installed, and also needs write access to the run directory FMS_RUN.
Note: The machine to which the fm_server process is installed does not need to contain a Myrinet NIC, but it must have IP access to the Myrinet switches and all compute nodes in the Myrinet fabric. Since FMS has a socket connection to each fma and also to each switch enclosure, make sure the system running fms will allow enough open file descriptors for every node in your fabric, plus every enclosure, plus another fifty or more to be safe. (Sometimes a disconnected fma may reconnect before the OS realizes the previously used socket is now closed.)
Stop the existing MX or GM mapper process on all Myrinet nodes in the fabric.
$ <install_path>/sbin/mx_stop_mapper
or
$ killall gm_mapper
where <install_path> is the MX install directory.
Start the FMA agent process on each Myrinet node in the fabric.
$ fma -d -s <fms-server>
where <fms-server> is the hostname of the host running the FMS server process. Each compute node must have read access to the FMS_INSTALL directory and the directory where MX or GM is installed, but does not need write access to the FMS_RUN directory.
Note: You can restart the fm_server or any or all of the fma processes independently.
You should have mapping routes within about 10 seconds after starting the fms and fma processes.
Use the fm_status command to see the current status of the FMS.
$ fm_status
Note: If you are using Myrinet-2000 M3-CLOS-ENCL or M3-SPINE-ENCL switches, it should not take longer than 30 seconds to map the Myrinet fabric.
If you are using Myrinet-2000 M3-E* switches, it may take up to five minutes to map the Myrinet fabric.
After the FMS system has been installed and the FMS database created, the operator (system administrator) should periodically check the "health" of the Myrinet fabric using the fm_show_alerts command. For example, there could be a screen on the operator's console that runs:
while true; do clear; fm_show_alerts; sleep 5; done
We also provide a web-based version of fm_show_alerts so that the health of the Myrinet fabric can be monitored remotely. If you need further information to explain a specific alert text message, refer to Appendix B: Alerts and the libfma/alerts.def file in the FMS distribution.
For a detailed discussion of the command-line arguments to all of the FMS tools, refer to Appendix A: Program Usage.
For a detailed listing of troubleshooting procedures to verify the health of a Myrinet installation, refer to the following FAQ entry.
In the following examples, MX has been loaded on all nodes in the Myrinet fabric, the fm_server server process and fma agent processes are running, and the FMS database has been created.
Example #1:
As a simple example to demonstrate an FMS alert and how to acknowledge and remove the alert, do the following:
The actual output from the FMS tools would look like:
$ ssh fog12 sudo killall fma $ fm_status FMS Fabric status 33 hosts known 31 FMAs found 1 un-ACKed alerts Mapping is complete, last map generated by fog20 Database is complete $ fm_show_alerts 34 Tue Oct 11 14:09:47 2005 Lost FMA contact from fog12 $ ssh fog12 sudo mx_start_stop start-mapper fma: no process killed $ fm_status FMS Fabric status 33 hosts known 32 FMAs found 1 un-ACKed alerts Mapping is complete, last map generated by fog20 Database is complete $ fm_show_alerts $ fm_show_alerts -r 34 Tue Oct 11 14:09:47 2005 [R] Lost FMA contact from fog12 $ fm_ack_alert -i 34 $ fm_status FMS Fabric status 33 hosts known 32 FMAs found 0 un-ACKed alerts Mapping is complete, last map generated by fog20 Database is complete
Example #2:
To simulate a node losing link connectivity to the Myrinet fabric, perform the following experiment:
Note: If this had been a real-world situation and a link had been detected as down, you would probably see other alerts such as badcrcs for this connection.
The following is a list of programs that work within the FMS environment.
All tools will search for these files by default in the directory /var/run/fms/database ($FMS_RUN/$FMS_DB_NAME). By default, the environment variable FMS_RUN is set to /var/run/fms, and FMS_DB_NAME is set to database. The location and name of the database directory can be overridden by the environment variables, FMS_RUN and FMS_DB_NAME, respectively, or command line arguments to the tools.
In order to easily support future Myricom hardware products, the description of all the hardware products is table driven. These tables are included as part of the fabric management installation, and their location can also be changed through environment variables or command line arguments. The default directory for the fabric management system is /opt/fms.
All FMS tools are located in $FMS_INSTALL/bin, and have the following in common:
The FMS image directory defaults to /opt/fms. The environment variable FMS_INSTALL overrides this default.
The Fabric Management System home directory defaults to /var/run/fms. The environment variable FMS_RUN overrides this default. The command line option -R overrides both of these settings.
The fabric description is stored in the $FMS_RUN/<database_name> directory. The <database_name> defaults to database. The environment variable FMS_DB_NAME overrides this default. The command line option -N overrides both of these settings.
Server/Agent processes
fm_server
The fabric server, fm_server, is run on a server which need not be part of the Myrinet fabric, but must have IP connectivity to all nodes in the fabric and to the monitoring linecards in all of the switches. The fms process must have filesystem access to the database files.
fm_server
[ -d ] - run in background as deamon
[ -D ] - enable debug output
[ -R <fms_run> ] [ -N <fms_db_name> ] - run dir and DB name
[ -h ] - print usage message
[ -V ] - print FMS version
fma
The fma process runs as a persistent agent on each Myrinet node. The fma node must have IP connectivity to the node on which the fms is running, but need not have access to the database files.
The name of the fm_server is obtained from, in order of increasing
precedence:
--with-fms= from configure command
FMS_SERVER= from "make install" command
The environment variable FMS_SERVER
The command line argument "-s"
fma
[ -d ] - run in background as deamon
[ -s <fm_server> ] - name of the node running the fms server
[ -x ] - fabric consists of ID-less xbars only
[ -i ] - fabric consists of xbars with IDs only
[ -D ] - enable debug output
[ -R <fms_run> ] - run dir
[ -h ] - print usage message
[ -V ] - print FMS version
Note: ID-less xbars (XBar16s) are used in Myrinet-2000 M3-E* switches, and xbars with IDs (XBar32s) are used in Myrinet-2000 M3-CLOS-ENCL and M3-SPINE-ENCL switches.
Database commands -- All of these commands are run on a node which has filesystem access to the database files.
fm_switch
fm_switch is used to view and manage the list of enclosures in the fabric. This command should not be used to modify the list of enclosures while fms is running - it is intended as a setup command to be used before the fms is started.
fm_switch
[ -a <switch_name> ] - add <switch_name> to the database of switches
[ -d <switch_name> ] - remove <switch_name> from the
database of switches
[ -R <fms_run> ] [ -N <fms_db_name> ] - run dir and DB name
[ -h ] - print usage message
[ -V ] - print FMS version
fm_db2wirelist
fm_db2wirelist reads the database of connections and prints a list of the contents of each switch's slots and where everything is connected. Running and reviewing this as a first step after creating the database is a good way to notice links that are out. This is run on a node which has filesystem access to the database files. Since this program does not modify the database, it can be run at any time.
fm_db2wirelist
[ -R <fms_run> ] [ -N <fms_db_name> ] - run dir and DB name
[ -h ] - print usage message
[ -V ] - print FMS version
fm_settings
fm_settings is used to view and change settings for the FMS. This should be run to change parameters while the FMS is not running, and the FMS will pick up the new values when it restarts. These parameters are saved in the database table fms_settings.
fm_settings
[ -p <paramater> <value> ] - set <parameter> to <value>
[ -l ] - list all parameters and their values
[ -R <fms_run> ] [ -N <fms_db_name> ] - run dir and DB name
[ -h ] - print usage message
[ -V ] - print FMS version
fm_walkroute
fm_walkroute is used to see the exact path a packet takes through the network given a starting host and a sequence of route bytes. This is run on a node which has filesystem access to the database files. Since this program does not modify the database, it can be run at any time.
fm_walkroute
[ -f <hostname> ] - the host from which to start
[ -n <nic_id> ] - the which NIC to use
[ -p <port> ] - which port on the NIC to use
[ -g ] - give extra-gory details about internal links
[ -R <fms_run> ] [ -N <fms_db_name> ] - run dir and DB name
[ -h ] - print usage message
[ -V ] - print FMS version
Example usage:
As an example, let's consider that you would like to determine all of the links/paths traversed within the Myrinet fabric when sending a message from host1 to host2. First, run mx_info or gm_board_info on host1, and locate the line of text in the routing table output corresponding to host2. E.g.,
7 00:60:dd:49:7d:e1 host2 [0] bd bf 90 91
and then pass this route bd bf 90 91 to fm_walkroute.
$ fm_walkroute -h host1 -- bd bf 90 91
Walking route: -3 -1 16 17 from host host1 nic 0
host host1 nic 0, rail 0 - switch1, slot 15, port 9
switch1, slot 14, port 6 - switch2, slot 9, port 28
switch2, slot 6, port 22 - host host2, nic 0, rail 0
fm_fixup_db
fm_fixup_db may need to be run after a software upgrade which changes the database format. The database is read and re-written if any changes are needed. If no changes are needed, the database is left untouched.
fm_fixup_db
[ -R <fms_run> ] [ -N <fms_db_name> ] - run dir and DB name
[ -h ] - print usage message
[ -V ] - print FMS version
FMS Client Commands -- These programs make IP queries to the fms server and need only be run on nodes which have IP access to the fms. The fms must be running for these commands to work.
fm_status
fm_status prints a summary of fms status.
fm_status
[ -s <fm_server> ] - address of node with fms process
[ -h ] - print usage message
[ -V ] - print FMS version
Example:
$ fm_status
FMS Fabric status
32 hosts known
31 FMAs found
1 un-ACKed alerts
Mapping is complete, last map generated by fog20
Database is complete
where "hosts known" is the count of all hosts in the database, "FMAs found" is the number of FMAs currently in contact with fms; "un-ACKed alerts" is a count of alerts not yet ACKed (see fm_show_alerts); "Mapping is complete" or "Mapping is in progress" tells whether mapping activity is occurring at this moment; and, "Database is/is not complete" tells whether the resolution of xbars found by mapping into specific linecards is complete yet or not.
fm_show_alerts
fm_show_alerts prints a list of active alerts. By default, this prints only alerts which have not been ACKed and are not relics. (See Appendix B: Alerts below.)
Each alert has a unique index which can be passed to fm_ack_alert to acknowledge the alert.
fm_show_alerts
[ -a ] - show ACKed alerts also (marked with [A])
[ -r ] - show relic alerts also (marked with [R])
[ -s <fm_server> ] - address of node with fms process
[ -h ] - print usage message
[ -V ] - print FMS version
For information on how to ack all alerts at once, refer to this FAQ entry for details.
Example output:
fma:fma-1 (vm) $A/tools/fm_show_alerts
4 Tue Sep 27 22:43:16 2005 17 badcrcs in 30 seconds on link between agonyswitch, slot 4, port 2 and purpleswitch, slot 5, port 7
2 Tue Sep 27 22:42:47 2005 Enclosure agonyswitch, slot 2 has experienced an overtemp shutdown
1 Tue Sep 27 22:41:46 2005 Enclosure 172.31.2.3, slot 1 is running hot
$ fm_show_alerts -s localhost
1 Tue Oct 11 01:40:37 2005 agony1, NIC 0 (serial_no=246987) got an SRAM Parity Error
fm_ack_alert
fm_ack_alert acknowledges an alert. This marks an alert as ACKed, possibly causing its deletion. (See Appendix B: Alerts below.)
fm_ack_alert
[ -i <id> ] - ACK alert with ID <id>
[ -s <fm_server> ] - address of node with fms process
[ -h ] - print usage message
[ -V ] - print FMS version
For information on how to ack all alerts at once, refer to this FAQ entry for details.
fm_maint
fm_maint places a line card into or out of maintenance mode. While in maintenance mode, the xbars on a line card will be treated as though they do not exist.
fm_maint
-e <enclosure> - switch enclosure name
-l <slot_no> - slot number of linecard to maintain
[ -p <slot_no> ] - port number to maintain
-m { up | down } - set the state to up or down
[ -s <fm_server> ] - address of node with fms process
[ -h ] - print usage message
[ -V ] - print FMS version
Alerts are created when certain exceptional events occur and are reported to the fms. Alerts persist within the fms until they are cleared. Clearing usually requires the alert to be acknowledged (ACKed) and for the condition which caused the alert to have cleared.
Once the alert has been acknowledged, it is marked as "ACKed". Once the condition that caused the alert has cleared, we mark it as a "relic". Most alerts are deleted only after they have been both relic-ed and ACKed.
The following is a list of all alerts and their meanings. The Flags line for each alert type may contain NEED_ACK or ACK_ONLY or both. If NEED_ACK is present, once the alert becomes a relic, it still needs an ACK before it is deleted entirely. If NEED_ACK is not present, the alert is deleted as soon as it becomes a relic. If ACK_ONLY is specified, the event is deleted as soon as it is ACKed. Without this flag, the alert will persist until becoming a relic, even after it has been ACKed.
Note: This list can also be found in the file libfma/alert.def in the FMS distribution.
HOST_NO_INITIAL_FMA
Description: No FMA connection has been established since FMS was started
Initiated by: timeout waiting for FMA connection
Cancelled_by: attachment to FMA
Flags:
struct {
lf_string_t hostname;
}
Format: "Have never gotten FMA contact from %s"
Args: hostname
HOST_LOST_FMA
Description: Connectivity was lost to the FMA on a host
Initiated by: A connection to a running FMA was lost
Cancelled_by: re-attachment to FMA
Flags: NEED_ACK
struct {
lf_string_t hostname;
}
Format: "Lost FMA contact from %s"
Args: hostname
HOST_LINK_DOWN
Description: A myrinet link between a host and switch is disconnected
Initiated by: inability to pass traffic through a link
Cancelled_by: resumption of traffic through the link or removal of the link
Flags: NEED_ACK
struct {
lf_string_t hostname;
uint16_t nic;
uint16_t nic_interface;
lf_string_t enclosure;
uint16_t slot;
uint16_t port;
uint16_t subport;
}
Format: "Link from %s, nic %d:p%d to %s, slot %d, port %d:%d is down"
Args: hostname nic nic_interface enclosure slot port subport
HOST_PORT_DOWN
Description: A myrinet port on a host is down, other end unknown
Initiated by: inability to pass traffic through a link
Cancelled_by: resumption of traffic through the link
Flags: NEED_ACK
struct {
lf_string_t hostname;
uint16_t nic;
uint16_t nic_port;
}
Format: "Link from %s, nic %d, p%d is disconnected"
Args: hostname nic nic_port
HOST_SRAM_PARITY_ERROR
Description: A NIC on a host has experienced an SRAM parity error
Initiated by: SRAM parity error reported by NIC
Cancelled_by: ACK
Flags: ACK_ONLY NEED_ACK
struct {
lf_string_t hostname;
uint32_t nic_id;
lf_string_t serial_no;
}
Format: "%s, NIC %d (serial_no=%s) got an SRAM Parity Error"
Args: hostname nic_id serial_no
HOST_FIRMWARE_DIED
Description: A NIC on a host has stopped responding
Initiated by: NIC error reported by Myri interface
Cancelled_by: ACK
Flags: ACK_ONLY NEED_ACK
struct {
lf_string_t hostname;
uint32_t nic_id;
lf_string_t serial_no;
}
Format: "%s, NIC %d (serial_no=%s), is not responding"
Args: hostname nic_id serial_no
HOST_SWITCH_LINK_BADCRC_COUNT
Description: A NIC port has a badcrc count that is too high.
Initiated by: A NIC port accumulates too many badcrcs over the
sample period.
Cancelled_by: ACKed by user
Flags: NEED_ACK ACK_ONLY
struct {
lf_string_t hostname;
uint16_t nic;
uint16_t nic_interface;
lf_string_t enclosure;
uint16_t slot;
uint16_t port;
uint16_t subport;
uint32_t badcrc_count;
uint32_t seconds;
}
Format: "Link from %s, nic %d:p%d to %s, slot %d, port %d:%d: %d Bad CRC packets in %d seconds"
Args: hostname nic nic_interface enclosure slot port subport badcrc_count seconds
HOST_UNRECOGNIZED_NIC_TYPE
Description: A NIC on a host has an unrecognized product ID.
Initiated by: Inspection of NIC product ID reported by fma
Cancelled_by: ACK
Flags: ACK_ONLY NEED_ACK
struct {
lf_string_t hostname;
lf_string_t product_id;
uint32_t nic_id;
}
Format: "%s, NIC %d, unrecognized product ID \\\"%s\\\""
Args: hostname nic_id product_id
SWITCH_XBARPORT_DISABLED
Description: An xbar port on an enclosure has been manually disabled
Initiated by: xbar port is seen to be disabled
Cancelled_by: xbar port is no longer disabled
Flags:
struct {
lf_string_t enclosure;
uint32_t slot;
uint32_t xbar;
uint32_t port;
}
Format: "Enclosure %s, slot %d, xbar %d, port %d disabled"
Args: enclosure slot xbar port
SWITCH_EXT_LINK_DOWN
Description: An external myrinet link between two switches is disconnected
Initiated_by: inability to pass traffic through a link
Cancelled_by: resumption of traffic through the link or removal of the link
Flags: NEED_ACK
struct {
lf_string_t enclosure1;
uint16_t slot1;
uint16_t port1;
uint16_t subport1;
lf_string_t enclosure2;
uint16_t slot2;
uint16_t port2;
uint16_t subport2;
}
Format: "Link from %s, slot %d, port %d:%d to %s, slot %d, port %d:%d is down"
Args: enclosure1 slot1 port1 subport1 enclosure2 slot2 port2 subport2
SWITCH_INT_LINK_DOWN
Description: An internal myrinet link between two xbars is disconnected
Initiated_by: inability to pass traffic through a link
Cancelled_by: resumption of traffic through the link or removal of the link
Flags: NEED_ACK
struct {
lf_string_t enclosure;
uint16_t slot1;
uint16_t xbar1;
uint16_t port1;
uint16_t slot2;
uint16_t xbar2;
uint16_t port2;
}
Format: "Internal link from %s, slot %d, xbar %d, port %d to slot %d, xbar %d, port %d is down"
Args: enclosure slot1 xbar1 port1 slot2 xbar2 port2
SWITCH_XBARPORT_DOWN
Description: An xbar port on an enclosure is down
Initiated_by: xbar port is seen to be down
Cancelled_by: xbar port is no longer down
Flags: NEED_ACK
struct {
lf_string_t enclosure;
uint32_t slot;
uint32_t xbar;
uint32_t port;
}
Format: "Enclosure %s, slot %d, xbar %d, port %d is down"
Args: enclosure slot xbar port
SWITCH_XBARPORT_UPDOWN_COUNT
Description: An xbar port has toggled state too frequently
Initiated_by: xbar port changes to "down" too many times w/in sample period
Cancelled_by: ACKed by user
Flags: NEED_ACK
struct {
lf_string_t enclosure;
uint32_t slot;
uint32_t xbar;
uint32_t port;
uint32_t updown_count;
uint32_t seconds;
}
Format: "Enclosure %s, slot %d, xbar %d, port %d: %d state changes in %d seconds, port disabled"
Args: enclosure slot xbar port updown_count seconds
SWITCH_XBARPORT_BADCRC_COUNT
Description: An xbar port has a badcrc count that is too high.
Initiated_by: An xbar accumulates too many badcrcs over the sample period.
Cancelled_by: ACKed by user
Flags: NEED_ACK ACK_ONLY
struct {
lf_string_t enclosure;
uint32_t slot;
uint32_t xbar;
uint32_t port;
uint32_t badcrc_count;
uint32_t seconds;
}
Format: "Enclosure %s, slot %d, xbar %d, port %d: %d Bad CRC packets in %d seconds."
Args: enclosure slot xbar port badcrc_count seconds
SWITCH_EXT_LINK_BADCRC_COUNT
Description: A link has a badcrc count that is too high.
Initiated_by: An xbar accumulates too many badcrcs over the sample period.
Cancelled_by: ACKed by user
Flags: NEED_ACK ACK_ONLY
struct {
lf_string_t enclosure1;
uint16_t slot1;
uint16_t port1;
uint16_t subport1;
lf_string_t enclosure2;
uint16_t slot2;
uint16_t port2;
uint16_t subport2;
uint32_t badcrc_count;
uint32_t seconds;
lf_string_t extra_text;
}
Format: "Link from %s, slot %d, port %d:%d to %s, slot %d, port %d:%d: %d Bad CRC packets in %d seconds%s"
Args: enclosure1 slot1 port1 subport1 enclosure2 slot2 port2 subport2 badcrc_count seconds extra_text
SWITCH_INT_LINK_BADCRC_COUNT
Description: An internal link has a badcrc count that is too high.
Initiated_by: An xbar port accumulates too many badcrcs over the sample period.
Cancelled_by: ACKed by user
Flags: NEED_ACK ACK_ONLY
struct {
lf_string_t enclosure;
uint16_t slot1;
uint16_t xbar1;
uint16_t port1;
uint16_t slot2;
uint16_t xbar2;
uint16_t port2;
uint32_t badcrc_count;
uint32_t seconds;
lf_string_t extra_text;
}
Format: "Internal link from %s, slot %d, xbar %d, port %d to slot %d, xbar %d, port %d: %d Bad CRC packets in %d seconds%s"
Args: enclosure slot1 xbar1 port1 slot2 xbar2 port2 badcrc_count seconds extra_text
SWITCH_HOST_LINK_BADCRC_COUNT
Description: A host link has a badcrc count that is too high.
Initiated_by: An xbar port accumulates too many badcrcs over the sample period.
Cancelled_by: ACKed by user
Flags: NEED_ACK ACK_ONLY
struct {
lf_string_t hostname;
uint16_t nic;
uint16_t nic_interface;
lf_string_t enclosure;
uint16_t slot;
uint16_t port;
uint16_t subport;
uint32_t badcrc_count;
uint32_t seconds;
}
Format: "Link from %s, nic %d:p%d to %s, slot %d, port %d:%d: %d Bad CRC packets in %d seconds"
Args: hostname nic nic_interface enclosure slot port subport badcrc_count seconds
SWITCH_XCVR_DISABLED
Description: A transceiver port on an enclosure has been manually disabled
Initiated_by: transceiver port is seen to be disabled
Cancelled_by: transceiver port is no longer disabled
Flags:
struct {
lf_string_t enclosure;
uint32_t slot;
uint32_t port;
}
Format: "Enclosure %s, slot %d, port %d disabled"
Args: enclosure slot port
SWITCH_XCVR_SIGNAL_LOST
Description: A transceiver port on an enclosure has lost signal
Initiated_by: transceiver signal_lost noted
Cancelled_by: transceiver signal_lost condition clears
Flags: NEED_ACK
struct {
lf_string_t enclosure;
uint32_t slot;
uint32_t port;
}
Format: "Enclosure %s, slot %d, transceiver port %d lost signal"
Args: enclosure slot port
SWITCH_LINECARD_HOT
Description: A linecard is too hot
Initiated_by: observed temp is over threshold
Cancelled_by: all temps are less than threshold - hysteresis value
Flags: NEED_ACK
struct {
lf_string_t enclosure;
uint32_t slot;
}
Format: "Enclosure %s, slot %d is running hot"
Args: enclosure slot
SWITCH_LINECARD_OVERTEMP
Description: A linecard is so hot it shut down
Initiated_by: overtemp count increased
Cancelled_by: ACK only
Flags: NEED_ACK ACK_ONLY
struct {
lf_string_t enclosure;
uint32_t slot;
}
Format: "Enclosure %s, slot %d has experienced an overtemp shutdown"
Args: enclosure slot
SWITCH_CANNOT_READ
Description: Cannot read data from the monitoring line card
Initiated_by: inability to contact switch
Cancelled_by: contact restored to switch
Flags: NEED_ACK
struct {
lf_string_t enclosure;
}
Format: "Cannot contact monitoring linecard on %s"
Args: enclosure
SWITCH_MAINTENANCE_MODE
Description: A line card is in maintenance mode
Initiated_by: User brings down a line card for maintenance
Cancelled_by: User takes line card out of maintenance mode
Flags: NEED_ACK
struct {
lf_string_t enclosure;
uint32_t slot;
}
Format: "Slot %d on enclosure %s is down for maintenance."
Args: slot enclosure
Every database file starts with 2 rows of column headers. The first row defines the data type of each column, and the second row defines the name of each column.
string,string name,product_id clos0,M3-E128 clos1,M3-E128 spine0,M3-E128
string,integer,string,string enclosure_name,enclosure_slot,product_id,serial_no clos0,1,M3-SW16-8F,4936 clos0,9,M3-SW16-8F,26848 clos1,1,M3-SW16-8F,4937 spine0,1,M3-SPINE-8F,22781
string,string hostname,sw_version host0000,GM host0001,GM host0002,GM host0003,GM
string,integer,MAC,integer,integer,string,string hostname,host_nic_id,mac_addr,ports,subports,serial_no,product_id host0000,0,00:60:dd:49:97:01,1,1,26848,M3F-PCIXD-2 host0001,0,00:60:dd:49:97:02,1,1,31875,M3F-PCIXD-2 host0002,0,00:60:dd:49:97:03,1,1,6878,M3F-PCIXD-2
name_1,slot_1,port_1,subport_1,name_2,slot_2,port_2,subport_2 string,integer,integer,integer,string,integer,integer,integer host0000,0,0,0,clos0,9,8,0 host0001,0,0,0,clos0,9,9,0 host0002,0,0,0,clos0,9,10,0 clos0,1,8,0,spine0,1,0,0 clos1,1,8,0,spine0,1,1,0
The following parameters may be set using fm_settings to control the behavior of the Fabric Management System.
Note: This list of parameters and instructions for modifying them can be found in the file libfma/lf_fms_settings_def.h.
low_freq_monitor_interval 120 seconds [default]
This specifies the "low frequency" interval for summing certain counts and comparing them to thresholds. For example, if too many badcrc counts are seen on a switch during this period, an alert will be raised.
lf_badcrc_threshold 5 badcrcs [default]
Maximum number of badcrcs allowed on a link during the low-frequency interval before an alert is generated.
lf_fatal_badcrc_threshold   100 badcrcs [default]
If more than this many badcrcs are seen on a link during the low-frequency interval, an alert is raised and the link may be disabled to traffic.
very_low_freq_monitor_interval   1800 seconds [default]
This specifies the "very low frequency" interval for summing certain counts and comparing them to thresholds.
vlf_portflip_threshold   10 transitions [default]
If a port goes up and down more than this many times during the very-low- frequency interval, an alert is raised.
switch_query_interval   30 seconds [default]
Interval between querying the monitoring linecards on the switch enclosures.
link_verify_interval   30 seconds [default]
Interval for the FMAs to veryify each link in the fabric.
link_verify_timeout   100 ms [default]
Time allowed for a response to a link verification packet to return before trying again or marking the link down.
link_verify_retries   3 retries [default]
Number of times to retry probing a link before it is marked down.
nic_scout_timeout   250 ms [default]
Amount of time to wait for a NIC to reply to a scout packet.
nic_scout_retries   3 retries [default]
Number of retries before giving up on scouting a NIC.
nic_query_interval   60 seconds [default]
Frequency with which NICs should verify eaqch other's presence on the fabric.
map_request_timeout   90 seconds [default]
When the FMS requests a map from an fma, this is the maximum amount of time it should wait before re-requesting from another fma.
resolve_packet_send_count   500 packets [default]
resolve_packet_min   240 packets [default]
resolve_packet_max   1200 packets [default]
resolve_retries   5 retries [default]
The setting control fabric resolution when using anonymous 2G 16-port xbars. They should only be adjusted with guidance from Myricom support.
alert_exec_cmd   <nil> [default]
The path to a command which will be executed everytime an alert is generated by fm_server. The text of the alert will be passed into stdin of this command. This command should return as soon as possible rather than lingering (for user input, for example) as the fm_server will wait() for this command to finish.
preferred_mapper   <nil> [default]
A comma-seperated list of nodes which we prefer the FMS choose to create fabric topology maps. If empty, the FMS is free to choose any fma to generate a map.
The FMS legacy tools are:
fm_create_db
This tool takes a map file generated by gm_mapper and a list of switch names and generates the database files for use by other tools. This is much the same as the existing wirelist tool.
The resulting fabric database will contain only the hosts and links included in the specified map file. Any links or hosts not present in the map file can easily be added manually afterwards.
fm_create_db -s <switch_list> -m <map_file>
[ -H <fms_home> ] [ -N <fms_db_name> ]
switch_list is a file with one switch name or IP address per line. This should be the complete list of switches providing connectivity for the hosts in the map file, and each must have a monitoring line card installed.
map_file is a map file generated by gm_mapper using the --map-file-0=<map_file> option (or --map-file=<map_file> on much older mappers). This map file should not come from Mute. For best results, it should be as complete as possible, with no links missing.
fm_watch_switches
fm_watch_switches periodically reads the information from all switch enclosures specified in the fabric database and reports information which may require attention.
It monitors certain counts for each port, and reports any whose delta since the last iteration exceeds a specified threshold. Variables monitored for each crossbar port are:
fm_watch_switches can also be used to report which ports, either transceiver or crossbar ports, have been manually disabled.
fm_watch_switches
[ -t <threshold> ] - reporting threshold, default 5
[ -i <interval> ] - polling interval in seconds, default 30
[ -g ] - watch goodcrc counts
[ -d ] - report disabled links
[ -H <fms_home> ] [ -N <fms_db_name> ] - home and DB name
Example Usage:
fm_watch_switches is a good diagnostic test to run when looking for links with high crc counts, especially while jobs are running since load will be high. fm_watch_switches does not interact with the fabric at all, so it is completely non-intrusive. If any links start showing excessively high badcrc rates, diagnostics should be performed (replace cable, etc.)
Refer to this FAQ entry for details.
fm_linktest
fm_linktest is used to look for failed or marginal interswitch links. It takes as input the fabric description and tests every interswitch link in the fabric individually. This includes links between crossbars inside a switch.
Marginal or failed links are identified and reported.
fm_linktest
[ -B <board_no> ] - which NIC to use
[ -i <host_ifc> ] - which NIC interface to use, default all
[ -l <kbytes> ] - length of test packets in KB, default 4
[ -H <fms_home> ] [ -N <fms_db_name> ] - home and DB name
Note that the -i option specifies which port/interface to use on multi-port/interface NICs like the PCIXE NICs. This option is not normally used since all ports/interfaces are tested by default.
Example Usage:
fm_linktest is run periodically as a health check of the fabric. The mapper will route around dead links so you do not know about them, but fm_linktest tries everything in the database.
Example Output:
The link from clos1 / slot 2 / port 5 to spine3 / slot 6 / port 13 seems completely down, no traffic passes.
Or
The link from clos1 / slot 2 / port 5 to spine3 / slot 6 / port 13 seems marginal, 50 / 80 packets were successfully transmitted.
Or
The internal link from clos1 / slot 9 / xbar 3 / port 17 to clos1 / slot 0 / xbar 0 / slot 1 is XXX.
This latter message indicates that there is definitely a card or enclosure problem. The first 2 messages indicate that cable diagnostics need to be run.
fm_linktest can be run anytime to look for dead links, but when run on an active network, false positives may appear on marginal links. The option -l 4 (just a few packets) is good for looking for dead links and can be performed anytime. The option -l 4000 is primarily used for finding marginal links and should be run on an unloaded (idle) network.
![]()
Last updated: 13 May 2010