Mute
A Graphical, Diagnostic, Monitoring Tool
for Myrinet-2000 M3-E* Networks

 

Table of Contents

Introduction

Mute is a graphical diagnostic monitoring tool for Myrinet-2000 M3-E* networks (switch(es), cables, and hosts). It exercises the monitoring capabilities of the Myrinet-2000 M3-E* switches (M3 Switch Tools, m3-dist.tar.gz), as well as the Mapper Tools (located in the mt subdirectory of the GM distribution). Mute builds an image/picture of the Myrinet-2000 Network, and can be used to non-intrusively monitor the Myrinet-2000 network in real time, analyzing the network traffic, and diagnosing/validating the integrity of the hardware components. Mute works only for Myrinet-2000 M3-E* switches equipped with a monitoring line card. The graphical interface uses GNOME/GTK, which is currently only available for Linux.

Note: The notation M3 denotes third generation Myrinet, a.k.a. Myrinet-2000.

Some of the features of Mute are:

Commonly-asked questions about Mute can be found on the FAQ.

Screenshots of Mute can be found at: http://www.myri.com/staff/finucane/mute.


Installation

Requirements

In order to install Mute, the following requirements must be met:


Installing Mute with GM-1

Instructions (Gnome/Gtk is available on a Myrinet node in the cluster)

The following instructions assume that you have Gnome/Gtk installed (or are willing to install Gnome/Gtk) on a Myrinet node in the cluster. If this is not the case, then you will need to follow these instructions for the installation of Mute.

  1. Download and untar the Myrinet-2000 (M3) M3-E* Switch Tools.
       http://www.myri.com/ftp/pub/m3-dist.tar.gz
       gunzip -c m3-dist.tar.gz | tar xvf -
    

    If you're using m3-dist version 1.0.14 or later, you need to add -DFOR_GM1 to the m3-dist/makefile as instructed in the following FAQ entry.

  2. Download and untar the Mute distribution.
       http://www.myri.com/ftp/pub/mute-1.9.6.tar.gz
       gunzip -c mute-1.9.6.tar.gz | tar xvf -
    

    If you're using Mute 1.9.6 or later with GM-1, you need to also specify the -DFOR_GM1 in the mute-1.9.6/makefile as instructed in the following FAQ entry.

  3. Install Gnome/gtk package (if not already available).

    You will need to install the GNOME/GTK package if it is not already available.

  4. Compile the Mapper Tools.

       cd {GM_SRC_HOME}/binary
       mkdir lib
       ln .gm_uninstalled_libs/lib/.libs/libgm.a lib/libgm.a
       cd ../mt
       make all gm
    

    where GM_SRC_HOME specifies the directory where GM was compiled.

  5. Compile the Myrinet-2000 M3-E* Switch Tools.

       cd $HOME/m3-dist
       make gmdir={GM_SRC_HOME} mtdir={GM_SRC_HOME}/mt host-no-snmp
    

    where GM_SRC_HOME specifies the directory where GM was compiled.

  6. Compile Mute.

       cd $HOME/mute
       make gmdir={GM_SRC_HOME} gminstalldir=<install_path> m3dir=$HOME/m3-dist
    

    where GM_SRC_HOME specifies the directory where GM was compiled, and <install_path> is the directory where the GM binaries and libraries are installed. If you're using gm-1.5.2.1 or earlier, there is no need to specify the gminstalldir= path.

Refer to {GM_SRC_HOME}/mt/README, $HOME/m3-dist/README, and $HOME/mute/README for detailed instructions on how to build mt, m3-dist, and mute.

You can now proceed to Building the Cluster's Image with Mute.


Instructions (Gnome/Gtk is not available on a Myrinet node in the cluster)

If Gnome/Gtk is not available on a compute node in the Myrinet cluster, you can install Mute on another machine in your local network that has Gnome/Gtk available and has IP access to the monitoring line card in the switch(es). I.e., this machine must be able to ping each of the Myrinet switches in the cluster.

  1. Compile GM, Mapper Tools, Switch Tools, and Mute on this machine in the local network, as detailed in these instructions.

  2. Compile the Mapper Tools and the Switch Tools on one of the compute nodes in the cluster, and then generate the needed Mute files (mute.map, mute.hosts, mute.switches, mute.xbars).

  3. Transfer these 4 Mute files to the Mute directory located on the machine in the local network. Invoke Mute on the Gnome/Gtk-enabled machine with the -w option to specify the directory where these Mute files are located, and it will build the image of the cluster using these Mute files.

        mute -w $HOME/mute
    

    You can now proceed to Building the Cluster's Image with Mute.

Note: If the topology of the cluster changes, you will need to regenerate the mute.map and mute.xbars files on a node in the cluster and transfer these files to your Mute directory and rebuild the image of the cluster.

This mode is useful for real-time non-intrusive monitoring of traffic on the Myrinet network, as well as analysis of the static image of the cluster.


Installing Mute with GM-2

Instructions (Gnome/Gtk is available on a Myrinet node in the cluster)

The following instructions assume that you have Gnome/Gtk installed (or are willing to install Gnome/Gtk) on a Myrinet node in the cluster. If this is not the case, then you will need to follow these instructions for the installation of Mute.

Important Note: Mute is not yet supported on GM-2.1.1 with M3F2-PCIXE-2 interfaces.

  1. Download and untar the Myrinet-2000 (M3) M3-E* Switch Tools (release 1.0.12 required for use with GM-2).
       http://www.myri.com/ftp/pub/m3-dist.tar.gz
       gunzip -c m3-dist.tar.gz | tar xvf -
    
  2. Download and untar the Mute distribution.
       http://www.myri.com/ftp/pub/mute-1.9.6.tar.gz
       gunzip -c mute-1.9.6.tar.gz | tar xvf -
    
  3. Install Gnome/gtk package (if not already available).

    You will need to install the GNOME/GTK package if it is not already available.

  4. Compile the Mapper Tools.

       cd {GM_SRC_HOME}/binary
       mkdir lib
       ln .gm_uninstalled_libs/lib/.libs/libgm.a lib/libgm.a
       cd ../mt
       make all gm
    

    where GM_SRC_HOME specifies the directory where GM was compiled.

  5. Compile the Myrinet-2000 M3-E* Switch Tools.

       cd $HOME/m3-dist
       make gmdir={GM_SRC_HOME} mtdir={GM_SRC_HOME}/mt host-no-snmp
    

    where GM_SRC_HOME specifies the directory where GM was compiled.

  6. Compile Mute.

       cd $HOME/mute
       make gmdir={GM_SRC_HOME} gminstalldir=<install_path> m3dir=$HOME/m3-dist
    

    where GM_SRC_HOME specifies the directory where GM was compiled, and <install_path> is the directory where the GM binaries and libraries are installed.

  7. Kill the mapper on the machine on which you will run Mute.
    killall gm_mapper
    
  8. Run the mapper with the "pause" option to quiet all the other mappers, and to build a map file for Mute:
    cd <install_path>/sbin/
    ./gm_mapper -v --level=10 --pause --map-file=$HOME/mute/mute.map 
    

    Control-C the mapper by hand when it finishes mapping and starts verify mode:

    ...
    14 3,-7,1,7
    h6 3,-7,2,7
    h31 -5,-12,6               <---computing routes
    h23 -5,-10,6
    h15 3,-6,-7,7
    h7 3,-6,-6,7
    h39 3,-6,-5,7
    map version is now 1929343636
    h0 checking hosts               <------- entering verify
    checking host h1                         mode
    checking host h2
    checking disconnected link -15 on x0
    checking for new hosts on x0
    map version 0:60:dd:7f:3b:bf 1929343636
    43 hosts and 17 xbars
    I am h0
    verifying again
    h0 checking hosts
    checking host h1
    checking host h2           <-------- control-C anywhere
                                         around here's fine
    
  9. OPTIONAL: If you are using GM-2.0.6 or later and you would like to see the hostnames in your Mute display, you will need to perform the following step. (This extra step is necessary because the GM-2 mapper does not know about hostnames.)

    After you have a map file (mute.map), the next step is to convert its logical hostnames (h0, h1, h2, etc.) into real hostnames (e.g., falbala1, falbala2, etc.)

    Available in GM-2.0.6 and later, this conversion tool is called board_names (<GM_SRC_HOME>/mt/tools/board_names.c). It uses the mac address to hostname routing table information output by gm_board_info (e.g.,)

       1 00:60:dd:7f:7b:a5                       falbala1 (this node) (mapper)
       2 00:60:dd:7f:7b:76                       falbala2 81
       3 00:60:dd:7f:7b:82                       falbala3 ba
    

    and then searches through the map file (mute.map) for each host by its mac address (e.g.,)

    h - h0
    1
    0 s - x0 14
    address 00:60:dd:7f:7b:a5  <------- mac address
    hostType 1 
    

    and replaces the hostname.

    h - "falbala1"   <------------------- replaced
    1
    0 s - x0 14
    address 00:60:dd:7f:7b:a5
    hostType 1
    

    By default, the new map file goes to stdout. This output should be saved and used as your mute.map file in order to see hostnames on the Mute display.

    For example (assuming mute.map is the original map file)

    1. Make a board info file
         cd <install_path>/bin/
        ./gm_board_info > board_info.out
      
    2. Add hostnames to the map file as map.with.hostnames.map
         cd <GM_SRC_HOME>/mt/tools/intel_linux/
         ./board_names mute.map board_info.out > map.with.hostnames.map
      
    3. Use this as your mute.map file
         mv map.with.hostnames.map mute.map
      
  10. Create the mute.switches file.mute.switches contains the IP address of each Myrinet switch in the cluster, one IP address per line.

  11. Run Mute specifying the working dir (where you put mute.map)
       mute -w $HOME/mute
    

    In the Build window check Find Xbars, Find Loopbacks, and Write Routes only. Uncheck everything related to mapping and build. When Find Xbars terminates, the GM mapper port will close and you can run mappers again without killing Mute.

    You can now proceed to Building the Cluster's Image with Mute.

  12. Rerun the mapper without --pause, to unpause everyone:

       cd <install_path>/sbin/
       ./gm_mapper -v --level=10 --map-once
    

    Note: the --level=10 option is important for both runs of the mapper (pausing and unpausing: you need to make sure this mapper has the highest priority. Default is 1 with GM mappers so 10 is fine. 2 would have worked also.)

    Let -map-once terminate the mapper. Now all the other mappers are awake again.

  13. Restart the mapper on the Mute host (/etc/init.d/gm restart).

Refer to {GM_SRC_HOME}/mt/README, $HOME/m3-dist/README, and $HOME/mute/README for detailed instructions on how to build mt, m3-dist, and mute.


Instructions (Gnome/Gtk is not available on a Myrinet node in the cluster)

If Gnome/Gtk is not available on a compute node in the Myrinet cluster, you can install Mute on another machine in your local network that has Gnome/Gtk available and has IP access to the monitoring line card in the switch(es). I.e., this machine must be able to ping each of the Myrinet switches in the cluster.

  1. Compile GM, Mapper Tools, Switch Tools, and Mute on this machine in the local network, as detailed in these instructions.

  2. Compile the Mapper Tools and the Switch Tools on one of the compute nodes in the cluster, and then generate the needed Mute files (mute.map, mute.switches, mute.xbars).

  3. Transfer these 3 Mute files to the Mute directory located on the machine in the local network. Invoke Mute on the Gnome/Gtk-enabled machine with the -w option to specify the directory where these Mute files are located, and it will build the image of the cluster using these Mute files.

        mute -w $HOME/mute
    

    You can now proceed to Building the Cluster's Image with Mute.

Note: If the topology of the cluster changes, you will need to regenerate the mute.map and mute.xbars files on a node in the cluster and transfer these files to your Mute directory and rebuild the image of the cluster.

This mode is useful for real-time non-intrusive monitoring of traffic on the Myrinet network, as well as analysis of the static image of the cluster.


Runtime Options for Mute

Usage:

 
  mute [options]

  -w, --working-directory=WORKING_DIRECTORY     path to working directory
  -f, --firmware-directory=FIRMWARE_DIRECTORY   path to firmware directory
  -t, --max-threads=MAX_THREADS                 max thread count
  -h, --event-history=EVENT_HISTORY             event history length

Building the Cluster's Image with Mute

The first step in using Mute is to build an image/picture of the cluster.

Before you invoke Mute, you should create a mute.switches file in the working directory. This file is a list of monitoring card IP addresses, one address per line. This file used to be created automatically by the "Find Switches" feature, but this feature fails with most customers, so it has been permanently disabled.

To run Mute, type

  su root
  cd $HOME/mute
  ./mute

This command will result in two windows appearing -- a Build Network Graph window

and an empty Myricom Mute 1.8 window.

Since this is the first time to run Mute, it does not allow the customer to un-check the boxes labeled Run Mapper, Find Loopbacks, Write Routes, or Find Xbars in the Build Network Graph window. These operations must be performed the first time you run Mute in order for the initial picture/image of the cluster to be generated.

Once you have checked/un-checked the desired boxes in this Build Network Graph window, click on Build to generate the image/picture of the cluster. The following output will appear in the Build Network Graph window.

If the above output does not appear, then refer to the Troubleshooting Section for guidance.

Upon successful completion of this Build process, the image/picture of the cluster will appear in the Myricom Mute 1.8 window (as depicted below),

and the four configuration files (mute.map, mute.hosts, mute.switches, mute.xbars, and mute.routes (optional, if the Write Routes box is checked) and mute.state (optional)) have been created in the working directory.

If the cluster's image does not appear in the Myricom Mute 1.8 window, or is only partially drawn, then refer to the Troubleshooting Section for guidance.

Subsequent executions of Mute will allow the boxes in the Build Network Graph window to be un-checked (operation not performed) if you wish to use existing Mute configuration files. (To accomplish this you could run Mute with the -w runtime option and specify the directory where the Mute configuration files can be found, or run Mute in the same working directory where the configuration files are located.)

In this case, the output in the Build Network Graph would look like the following:


Troubleshooting the Build process

Current cautions and common problems:

Scenario 1:

If errors appear in the Build Network Graph window, does the output look like the following?

If yes, then there are two possible explanations:

  1. Mute cannot open a tcp connection to one of the Myrinet-2000 M3-E* switches in your mute.switches file.
  2. Is there a mapper running while you are trying to build the cluster image with Mute?

Scenario 2:

Do you see Permission denied messages in the Build Network Graph window?


Details of the Build process in Mute

To build an image of your cluster, Mute executes three steps.

  1. Run Mapper which runs the mapper to get the network topology, and generates the file called mute.map.
  2. Find Switches runs find_switches to get a list of IP addresses for the Myrinet-2000 M3-E* switch monitoring cards, and generates the file called mute.switches.
  3. Find Xbars runs find_xbars to make the correspondence between individual xbars (switches in the map file) and Myrinet-2000 xbars, and generates the file called mute.xbars.

As each of these steps is completed, diagnostic information will scroll in the Build Network Graph, and Mute will create the following six files in the working directory where it builds its image of the cluster.

You can edit the Mute configuration files by hand if you like. They are humanly readable ascii files.


Usage Scenarios

Scenario 1:

How do I isolate the cause of a high badcrc count?

Isolating the cause of a high badcrc count is an iterative process.

The procedure detailed below can also be performed by hand using the steps outlined in the Troubleshooting section of the FAQ.

After building the cluster's image, the following steps should be performed:

  1. Reset the counters on the switch View-->Reset Switches.
  2. Run gm_stress for 10-20 minutes on the cluster. Refer to "How do I run gm_stress.c to validate my GM installation?" for details.
  3. Select View-->Show Bad Packets

  4. Select Windows-->Legend

And if you left click on the magnifying glass icon and then left click on the switch, it will enlargen the image so that you can more clearly see the names of the hosts connected to the specific ports on the switch.

You could then use File-->Save Positions to save the desired resizing of the images for the next invocation of Mute.

From this output information, we can see that the cable connecting the host brutus06 and port 8 on the lowermost line card has the highest badcrc count.

Note:

Also refer to the FAQ entry "Are the counters (badcrcs) reported by gm_counters on each machine the same as I see in Mute?". Mute only reports badcrc information from the switch counters. It is important that you also check for badcrcs in the host counters (gm_counters output).


Scenario 2:

How do I check the temperature of the switches?

After building the cluster's image, select View-->Show Temperatures

and the switch will appear in color as follows:

To determine the meanings of these colors, you can then select Windows-->Legend

and the following Legend of colors will appear.

All temperatures reported in the Legend are in degrees Celsius. The highest temperature is represented by the color red and the coolest temperature is represented by the color blue. The temperature value of the colors is relative. Thus, it is important to refer to the Legend to see what range of temperature corresponds to the color red. Note that just because a component is denoted by the color red is not necessarily a sign for alarm.

Typical operating temperatures for the components of the switch are in the range of 30-39 degrees Celsius. If you see temperatures in the range of 50-55 degrees Celsius, the switch is too hot and shutdown may occur. (Shutdown occurs at 55 degrees Celsius and operation resumes at 50.) Check that the switch has proper ventilation.


Scenario 3:

How do I upgrade the firmware on the Myrinet-2000 M3-E* switch(es)?

You can use Mute to upgrade the firmware on the Myrinet-2000 M3-E* switch(es) in your network.

If your Myrinet-2000 M3-E* switch(es) are not running release-0.9.9.2 or later, you must first use Mute to upgrade the firmware before you can build the image of the cluster. Details of this process can also be found at "How do I upgrade the firmware on the monitoring line card?" on the FAQ.

Upgrading the firmware on the Myrinet-2000 M3-E* switch(es) is performed in the following four steps. If you have multiple switches, Mute will upgrade the firmware on all switches simultaneously.

  1. Invoke Mute with
    mute -w <working dir> -f <firmware dir>
    
    where <working dir> is the directory containing the mute.switches file (a list of IP addresses of all Myrinet-2000 M3-E* switches, one per line), and <firmware dir> is the directory containing the vxWorks_var.hex file (m3-dist directory).
  2. Select Windows-->Firmware in the Myricom Mute 1.8 window, and the following window will appear.

  3. Press Read Switches, and wait for the versions of the firmware to be printed in the uppermost window.

  4. If the firmware versions are less than release-0.9.9.2 or if you want to upgrade for some other reason, then click Update Firmware and the following text will appear in the Update Firmware window.

    This will program every Myrinet-2000 M3-E* switch with vxWorks_var.hex if the hex file is newer than the firmware in the switch. Reprogrammed switches will have their monitoring cards automatically rebooted.

Note: Alternatively, you could invoke mute with no options, close the Build window, and select Preferences and set the firmware directory to where the vxWorks_var.hex is located, for instance, the m3-dist directory, and working directory to the directory containing your mute.switches file. You would then follow steps 2-4 as outlined above to upgrade the firmware on the switch(es).


Scenario 4:

How do I run Mute in offline mode?

You can run Mute in offline mode on any host that has Gnome/Gtk installed.

  1. Copy the following files to the Mute-capable host:
    *.html                 lynx -source m3-switch-name/all
    mute.hosts             mapper's mapper.hosts file
    mute.map               mapper's mapper.map file
    mute.switches          find_switches (or typically by hand)
    mute.xbars             find_xbars
    

    The *.html file for each Myrinet-2000 M3-E* switch is named with an IP address.
    (e.g., lynx -source 206.117.208.81/all > 206.117.208.81.html)

  2. Invoke mute
    mute -w <working dir>
    
    where <working dir> is the directory where these files are located.
  3. Click on Build in the Build Network Graph window, and the following text will appear:

You can then proceed to exploring and investigating the cluster's image remotely.


Scenario 5:

Can I use Mute to interactively monitor the traffic in the Myrinet network?

Yes. You can use Mute for "real-time" non-intrusive monitoring of traffic on the Myrinet network.

Select Windows-->Counters and a small window will appear where you can specify the monitoring of Bad Crcs or Good Crcs.


Scenario 6:

How do I interactively monitor the switch traps using Mute?

Using Mute, you can interactively view the switch traps that are being generated on each switch in the Myrinet network.

Select Windows-->Events and the Events window will appear. To start listening to the traps on the switches, click on the Listen toggle at the bottom of the Events window, and the events will start scrolling by.

If Log Events is enabled (File->Preferences dialog box), events will also be written to a file called mute.events in the working directory. The file is reopened and appended to every time the events Listen toggle is activated. The file is closed when the listening is stopped.

The Events window contains several columns of information. The columns are labeled Time, IP address, Event, Count, Part, and Index.

The Time column lists the current local time. The IP address lists the IP address of the switch reporting the specified trap. The Event columns lists the switch trap that is being reported. A list of all available switch traps can be found in the following FAQ entry. The Count lists the number of times this trap has been generated. The Part specifies which component is generating the trap. Possible values of Part are:

And finally, the Index specifies which Part is generating the trap.

For example, let's say xbarPort 122 is generating lots of missedBeatTrap, and you would like to know where that xbarPort is located. The easiest way to determine this information is to go to the web interface, and select "all".

  http://<switch_name>/all

where <switch_name> is the name or IP address of the switch (listed in the IP address column of the Events window). You can then do a Find (with your HTTP browser) for xbarPort 122. Once you have located the specific Part, you can then scroll up slightly and you will be able to see with which line card it's associated, as well as to which physical port number it's associated.


Overview of Mute's Features

The Myricom Mute 1.8 window has four menu items -- File, Edit, View, and Windows, as well as positioning/resizing features such as point, zoom in/out, scroll, info, and control. For example,

Click on the magnifying glass to zoom in and out (left / right mouse click) and use the hand to slide the image around, and the arrow to move items. Drag or shift select multiple items.

File Menu Button

The File menu button has the following options:

Edit Menu Button

The Edit menu button has one option, Find, for locating ascii text strings in the image. For example, use Edit->Find to find switches or hosts by their names (ip, mac address, etc.).

View Menu Button

The View menu button has the following options:

Windows Menu Button

The Windows menu button has the following options:

Warnings:



Last updated: 19 May 2006