fms-1.3.2 BUG FIXES: 1) Correctly handle disjoint subfabrics. 2) efma handles remap requests ENANCEMENTS: 1) New cards: 10G-SW32LC-16EM 2) MoE map time changed to 6 seconds ============================================================================ fms-1.3.1 BUG FIXES: 1) Correctly notice disabled xcvrs on 10G switches. 2) Add fm_gw_info to dump route and gateway cache information for Myricom ethernet gateways. 3) Fix some fm_server issues causing links that are down on startup to not be recognized as down. 4) Fix compile error related to syslog() on newer gcc ENHANCEMENTS 1) Handle old 10G switch firmware with 6C2ER ports, use "get version" for future switch data APIs. 2) MoE mapping not enabled until an MoE node announces itself. This will speed mapping a lot in a fabric with Myrinet Ethernet Gateways but no Myrinet-over-Ethernet nodes. 3) When a node is seen to have only 10G NICs in it, assume all xbars in fabric have IDs. 4) Add support for more 32-port 10G linecards, including THRU cards with Ethernet Gateways 5) Un-buffer output on fm_watch_switches so it can be more effectively redirected to a file. 6) fm_watch_switches now gives a metric of total load on a switch 7) Add support for -THRU cards with ethernet ports to fm_server 8) fma will report linecards which have no serial number programmed. 9) fma will now give slightly better quality routes, possibly improving network performance 10) Add ethernet gateway diagnostic tools: fm_gw_dump 11) Change fm_ping_xbar to use proxy interface so it can run without needing to stop fma. 12) Allow different linecard types to have different temperature warning thresholds. Remove custom setting for overtemp threshold. 13) Updated mechanism for computing clos levels, with fallback (-C) to old method for use when needed. 14) New daemon, efma, which will map Myricom ethernet switches through the ethernet port, with no Myrinet ports needed. 15) If no Myrinet API is installed on system, do not fail by default and build only the non-Myrinet tools. 16) Support remap request from gateway ports. ============================================================================ fms-1.3.0 BUG FIXES: 1) Correctly compute xbar port number on 10G switches. This bug resulted in bad data from fm_watch_switches and incorrect/missing alerts from fm_server. 2) Allow for bad flags from 2Z processors. ENHANCEMENTS 1) Handle mixed-mode MoE and MoM networks, replaces mxoed. 2) Automatically cycle log files when started in daemon mode for both fma and fm_server 3) Rename fm_server log file from fms.log to fm_server.log . 4) Add fm_dbdelete for removing individual items from DB. 5) Add FMA_SHUTDOWN alert to differentiate a clean fma shutdown from a crash. 6) Add "-s" option for fm_server to log to syslog in addition to its regular log file. 7) Add alert_exec_cmd to fm_settings. 8) Support for 21U 10G switch. ============================================================================ fms-1.2.5a BUG FIXES 1) Fix fm_server crash when reading switch enclosures at startup. ============================================================================ fms-1.2.5 BUG FIXES 1) Fix crash that can occur during mapping on tagged xbar systems when there is some packet loss. 2) Some changes to prevent continous remap on very busy clusters. 3) Fix crash that could occur when NIC reports badcrcs and we don't know where it is connected. 4) Correctly read badcrc counters for 10G cards. ENHANCEMENTS 1) Add "-d" flag to fms to put self in background, remove need for complicated "nohup" startup. 2) Improve error message a little when connection to fma fails. (Usually due to an open file descriptor limit that is too low) 3) Add 2Z support to FMS. 4) Add setting "system_id" which is what fms will always place in mapper_mac 5) Add a "unique_id" per NIC which is persistant across remaps for 2Zs. 6) Change name of fms process to fm_server. fms still exists, but its use is deprecated. ============================================================================ fms-1.2.4 BUG FIXES 1) Fix bug preventing multiple NICs from working under GM-2.0 2) Fix fms crash that can occur when last remaining fma has its socket terminated in an unfriendly way. ENHANCEMENTS 1) Add backwards compatability for older 2Z firmware. 2) Link aggregation support for MX. ============================================================================ fms-1.2.3 BUG FIXES 1) Fix "Bad node type collides with compare" error which could occur on certain topologies. 2) Fix bug which caused remapping loop when a newly added host (from the DB's perspective) is disconnected. 3) Remove some warnings generated by very picky compilers. 4) Use telnet interface for x32 switches. ENHANCEMENTS 1) Add support for some 10G equipment. ============================================================================ fms-1.2.2 ENHANCEMENTS 1) Add support for 10G switches BUG FIXES 1) Fix situation where fms could crash when all links to an xbar are down. 2) Correctly handle quadrant disable information from 10G switches. 3) When computing multiple routes, make sure all are unique. 4) Fix core dump when pt2pt is mixed with xbar interconnect. ============================================================================ fms-1.2.1 BUG FIXES 1) Map distribution would sometimes fail on large fabrics (> 1000 nodes) ============================================================================ fms-1.2.0 ENHANCEMENTS 1) Add support for partitions. 2) Add ability to control NIC scout timeout/retry settings from FMS. 3) Add explicit reporing of duplicate MAC addresses in standalone fma. (fms already had it) 4) Added a little more fma verbosity as to reason for remaps. 5) Re-work of NIC and probe assignment in standalone mode. 6) Add hostnames to most logged messages. 7) Speeded up map distribution in standalone mode. 8) Use select()/poll() with MX on Linux and MacOSX to speed up both mapping and map distribution. 9) fms_settings can be used to define a preferred set of hosts for the fms to select as mapper. BUG FIXES 1) Fixes condition where FMS would abort if connectivity to FMA was lost under certain conditions. Error message is: Error while clearing all verify probes 2) fma crashes when nodes have multiple NICs with one disconnected 3) Fixed small memory leak in fms. 4) Keep NIC verifies running, even if a discrepancy is noted. 5) Fix problem causing constant remapping on tagged xbar fabrics with empty linecards. 6) Fixed bug where certain sequences of start/stopping and changing levels of fmas could result in constant remapping of fabric. 7) Fixed potential crash with GM-2.0 or GM-1 and multiple NICs. 8) Loss of connection to 1 port of multi-port NIC could cause continuous remapping. 9) Fixed bug where a small number of useful links might not be verified. 10) FMS did not deal properly with physically moving a link from a NIC to a different port on the same xbar. ============================================================================ fms-1.1.4 BUG FIXES 1) FMS did not give proper error when stacksize is too low to start enough switch monitoring threads. 2) Fix state error resulting in fms abort with message "Map just sent and map request pending?" 3) Fix problem where fma would abort with "NIC already connected to this xbar?" 4) fma will wait longer for a map if actively mapping fma is FMS-driven 5) If a disconnected fma is chosen as mapper, mapping would not converge. Now, fma that notices bad map is elected as mapper. 6) Known disconnected fma would improperly repeatedly to complain to FMS about down link. 7) Fix fma core dump from rare race condition when mapping gets cancelled. 8) Multi-port NICs would have routes out of disconnected port. 9) Verify direct connects on all ports of multi-port NIC ENHANCEMENTS 1) Add fms setting map_request_timeout which controls how long fms will wait for an fma to send generate and return a requested map. 2) Cut down noisy output when mapping with debug enabled. 3) Increase timeout for NIC verify responses when using FMS. 4) Create better routes for odd topologies (e.g. hosts attached to spine xbars) 5) Generate more informative alerts about down NIC ports ============================================================================ fms-1.1.3 BUG FIXES 1) Couple of fma segfault fixes. ============================================================================ fms-1.1.2 ENHANCEMENTS 1) Reduce memory footprint of fma by factor of 5 on large clusters. 2) Speed up routing significantly. 3) Do not show RX Timeout counts in fm_watch_switches by default. Added "-r" to show RX Timeout counts. 4) Print other end of INTERNAL link in fm_watch_switches ============================================================================ fms-1.1.1 ENHANCEMENTS 1) Add support for ethernet link aggregation. ============================================================================ fms-1.1.0 ENHANCEMENTS 1) Speed up fabric resolution by allowing switch queries on demand 2) Make fabric resolution settings be modifiable via fm_settings 3) Add timestamps to log output. 4) Add ability to automatically be a daemon (-d), also log to file instead of stdout/stderr. BUG FIXES ============================================================================ fms-1.0.3 ENHANCEMENTS 1) Support point-to-point connections in standalone mode 2) fma "map level" can be specified with -l 3) Lower invalid route count threshold in fm_create_db and make it runtime modifiable. 4) Total rework of mapper portion to make mapping large anonymous clusters significantly faster. 5) Automatically upgrade DB tables when format changes from release to release. 6) Remove several superfluous messages seen while retrying reads of M3-Exx switches. 7) Add fm_fixup_db command to perform DB fixup as needed between versions. 8) Support a mix of fmas with and without fms contact. 9) Improvements to mapper leader election in standalone mode. BUG FIXES 1) Interval between standalone link verify operations was too small, generating more network traffic than necessary. 2) Fix possible memory corruption when mapping xbars with no IDs. 3) Bring version file up-to-date for "-V" output. 4) Ignore SIGPIPE which we may get when a write to a closed socket occurs. 5) Fix problem where fabric resolution takes longer than it should. 6) Fix slow route generation when some linecards have no hosts. 7) Add timestamp and "-a" to fm_watch_switches to show absolute values used to compute deltas. 8) Fix segfault that can occur when multi-NIC host comes up while another node is in the middle of mapping. 9) Fix segfault in fm_create_db when FMS_RUN environment variable not set. 10) Fix segfault in fma_dfs_calc_routes() when 1 NIC of multi-NIC host is disconnected. 11) Do not destroy route to a GM node just because its fma goes away. 12) Fix numerous segv/memory corruption problems in fma/fms. ============================================================================ fms-1.0.2 BUG FIXES 1) Fix segmentation fault caused by NIC counter checking code in MX shim. 2) Fix build problems with older GM versions. 3) Add productinfo for "M3F-PCIXD-4 2 SRAM V 2.0" 4) Fix segfault when mapping cancelled with compares in flight. 5) Fix "Double set of inv rt count flag enc" 6) Fix problem when only one port of E card connected ============================================================================ fms-1.0.1 ENHANCEMENTS 1) Introduce versioning and -V