
Before doing anything with the api or other functions, you need to initialize the lanai libraries by calling one or the other of the following initialization functions.
int open_lanai_device(void) /* returns the number of LANai boards found */and
int open_lanai_copy_block(int *length, unsigned short *sts); /* length is the length of the allocated DMA copy block */ /* sts is the maximum burst size allowed */ /* returns the number of LANai boards found */
The function of open_lanai_device() is to find all the Myrinet
boards in a system and return the number of units found.
As a side effect it defines and initializes the following
global variables that will be used by other parts of the library.
open_lanai_copy_block() performs the same operations
as open_lanai_device() and in addition, it allocates
and defines the copy_block.
volatile unsigned short *LANAI[UNITS]; /* short ptrs to LANai SRAM */ volatile unsigned int *LANAI3[UNITS]; /* int ptrs to LANai SRAM */ volatile struct LANAI_REG *LANAI_SPECIAL[UNITS];/* ptrs to LANai special registers */ volatile unsigned short *LANAI_CONTROL[UNITS];/* ptrs to board control registers */ struct MYRINET_EEPROM *LANAI_EEPROM[units]; /* ptrs to board EEPROM or a copy */ /* dma pointers - Only initialized if you call open_lanai_copy_block() */ volatile unsigned int *UBLOCK[UNITS]; /* a block of user-accessible memory for DMA */ volatile unsigned int *DBLOCK[UNITS]; /* the same memory, but pointer is DMA address */ /* DBLOCK ptrs are used for programming LANai DMA */ volatile unsigned int *DLANAI[UNITS]; /* dma pointer to LANai SRAM */ /* used for board to board transfers */
The program that runs on the LANai boards is called the MCP. MCP stands for Myrinet Control Program. The MCP is not burned into the LANai board. Instead, it is loaded into the LANai by the workstation. The Myrinet IP Driver loads the MCP into the LANai. Users may load the MCP manually with the lload program. Application programs may also load the MCP by calling the LANai Device Library function lanai_load_and_reset().
The MCP's jobs are:
Each queue has a single producer and a single consumer. A particular channel should be used by a single workstation process. Channel 0 is used by the TCP/IP driver. The rest of the channels are used by application programs written with the Myrinet API. Channel 0 interrupts the workstation when it has received or sent a message. This is optional. Use the function myriApiSetInterruptMask() to tell the MCP if you want to be interrupted when a send finishes (SEND_INTERRUPT_MASK) or when a receive finishes (RECEIVE_INTERRUPT_MASK). The other channels do not generate interrupts, and processes using the other channels must poll for received messages. Programs not using channel 0 should not call myriApiSetInterruptMask().
When writing code in a kernel to interact with the API, you will need to manipulate LANai bits for enabling/disabling and clearing interrupts. There are three functions for doing this.
void myriApiInterruptEnable(int unit); void myriApiInterruptDisable(int unit); int myriApiInterruptPending(int unit);
The function myriApiInterruptEnable() enables interrupts from the LANai by setting the LANai EIMR=HOST_SIG_BIT. myriApiInterruptDisable() turns off board interrupts, write zero into the EIMR and clears the ISR HOST_SIG_BIT. Finally, myriApiInterruptPending returns true if the LANai has requested an interrupt by setting the HOST_SIG_BIT. Note that the Disable function is also a "clear" function, so you should test for interrupt pending before you disable the interrupts.
The function myriApiGetNumSendsPending() returns the number of messages left on the Send queue.
A channel also keeps a list of multicast addresses. A message received by the MCP with a multicast address as its destination will be given to the workstation only if the multicast address is in a channel's multicast list.
LANai boards have a unique 48-bit address written into their EEPROMs. It is the hosts responsibility to give this address to the MCP. The MCP uses 64-bit addresses. There are various LanaiDevice function calls to get the LANai address from the EEPROM.
If the highest bit of a Myrinet address is set, the address is a Multicast address. If the highest bit is not set, the address is a regular address. A message's destination is determined by the myrinet address and the channel in its header. A message sent to a regular address is delivered to the LANai board whose MCP has that address. The message is then given to the process on the workstation that is responsible for the message's destination channel. A message sent to a multicast address is delivered to all LANai boards. Each channel keeps a list of multicast addresses that it will accept. If the multicast address of the message matches one of the multicast addresses for the destination channel, the message is given to the workstation. Otherwise the message is dropped. The workstation adds multicast addresses to a channel's multicast address list with myriApiMulticastListen(). The workstation removes multicast addresses from a channel's multicast address list with myriApiMulticastDrop().
Before you can do anything at all with the MCP, the MCP must be loaded into the LANai. Normally, the device driver would have performed the following series of steps:
You should only set the MCP address and DMA burst size after you load the MCP code and before you release the LANai from reset. This is easily accomplished by using the combined function myriApiLoadLanai(). The library has the MCP code compiled directly into it, so the library routines and LANai MCP will always match.
void myriApiLoadLanai(int unit, unsigned short sts,
int mask, unsigned char *myriAddrPtr);
This function takes a unit number, a burst (sts) value, and interrupt (mask) and a pointer to the 8-byte address that this unit should use (myriAddrPtr). The API library is pre-compiled with the LANai code to load into the LANai, so calling this function loads the default MCP, then sets the sts, interrupt mask and address.
LANai boards can be reset at any time by the Myrinet, as part of a deadlock avoiding scheme. A Myrinet generated reset is called an FRES (Forward Reset). FRESes are examples of Hardware Resets. The workstation can give the LANai board a Hardware Reset also, by calling various LANai Device Library functions. A Hardware Reset restarts the execution of the MCP program. All MCP data structures are re-initialized, except for the MCP address and the DMA burst size. Thus the channels are reinitialized, and any messages that were in the channel queues when the reset occured are lost. You should not set the DMA burst size or MCP address after an FRES.
There is also a Software Reset. The workstation can generate a software reset by calling the API function myriApiReset(). A Software Reset resets only a single MCP channel.
After a hardware or a software reset, the workstation must re-establish communication with the MCP. This re-establishment of communication occurs in a series of steps:
The LANai board transfers data between workstation memory and LANai memory using DMA. All workstation pointers must be valid DMA pointers: they must be aligned on a 4-byte boundary, and they must be in whatever memory space the OS requires them to be in. The memory must be locked down so that it cannot be paged or swapped. All data offsets and lengths must be a multiple of 4-bytes. The STS value is used by Sparc platforms and is determined by the kernel driver. This is an encoding of the various burst-sizes that are supported by the SBus controller.
DMA pointers have different requirements on different platforms and operating systems. Here is a list of requirements for some platforms.
| Hardware | OS | Valid DMA address |
|---|---|---|
| Sparcstation-2 (sun4c) | SunOS | Any 32-bit-aligned Kernel Virtual Address |
| Sparcstation-10,20 (sun4m) | SunOS | Any 32-bit-aligned IOMMU-mapped Address |
| Sparcstation | Solaris 2.3, 2.4 (SunOS 5.3, 5.4) | Any 32-bit-aligned ddi_dma_addr_setup() Address |
| HP Skyhawk | HPUX 10.0 | Any 32-bit-aligned wsio_map() Address |
| Pentium | Linux 1.2.x | Any 32-bit-aligned Kernel Virtual Address (same as Physical Address) |
Using an invalid DMA pointer leads to different problems on different machines. A Sparc-1 will hang, a Sparc-2 or Sparc-5 might complete the transaction (with a bus error) without transferring any data. Other machines may panic. The Myricom PCI board version 2.1 will hang the bus and therefore hang the machine.
The send queue is actually a circular buffer of length NUM_SENDS. The workstation adds Send Items to the Send Queue. A send item represents a single message. The MCP removes send items from the send queue and sends them over the Myrinet. If the send queue is full, the workstation must wait for it to become un-full. This will happen as soon as the MCP removes the next item from the send queue and sends it. The workstation should free its copy of the message represented by a send item after the workstation has sucessfully added an additional NUM_SENDS items to the send queue. This is one way to tell whether a message has made its way through the MCP and onto the Myrinet. The other way to tell whether a message has been sent is to call myriApiGetNumSendsPending(), which returns the number of messages left on the Send queue. When myriApiGetNumSendsPending() returns 0, the message--and all messages--have been sent.
To send a message you give the MCP a list of host buffer. The MCP gathers these buffers together, in order, and sends them as one message. The Gather structure specifies the list of buffers to send as one message. A Gather is defined as:
typedef struct Gather
{
unsigned length;
void*pointer;
} Gather;
The programmer should use myriApiSend() to add a send item to the send queue. myriApiSend() is defined as:
int myriApiSend (unsigned unit,
unsigned channel,
char address[8],
unsigned destintionChannel,
unsigned numGathers,
Gather*gathers);
myriApiSend() builds a send item out of its parameters. The unit is the LANai board index, in case there is more than one board in the workstation. channel is the MCP channel to send from. address is an array of 8 bytes: the Myrinet address of the MCP to send to, and destinationChannel is that MCPs channel. numGathers is the size of gathers, which is an array of Gather structures. numGathers must not be greater than MAX_GATHERS. The sum of the lengths of the gathers must not be greater than MTU, and each length must be a multiple of 4. Each gather fragment must be a valid DMA pointer. myriApiSend() returns
When the MCP reads the next send item from the send queue it steps through the gathers array and dmas each gathers message fragment, assembling a complete message in LANai memory. Next the MCP removes the send item from the send queue. Finally the MCP sticks a header in front of the message and sends the thing out into the Myrinet.
There is a second send function called myriApiSendWithChecksum():
int myriApiSendWithChecksum(int unit,
int channel,
char address[8],
int destinationChannel,
int numGathers,
Gather*gathers,
int checksumOffset,
int checksumCorrection);
myriApiSendWithChecksum() is myriApiSend() with two additional parameters, checksumOffset and checksumCorrection, which specify a position in the message to be sent to write a 16-bit Internet Checksum into, and a correction number which will be added to the checksum. This feature is for writers of TCP/IP drivers who want to take advantage of the checksum computation hardware built into the LANai board. myriApiSend(...) is equivalent to myriApiSendWithChecksum(...,NO_CHECKSUM,0).
Receiving is more complicated than sending. Receiving takes two queues. The Receive Buffer Queue is a queue of items representing groups of workstation buffers to receive messages into. These groups are called "Scatters". A Scatter has the same definition as a Gather:
typedef struct Gather
{
unsigned length;
void*pointer;
} Gather;
When the MCP receives a message from the Myrinet, the MCP removes an item from a channel's receive buffer queue. The MCP DMAs the message into the scatter of that receive item. The MCP DMAs the beginning of the message into the first scatter buffer, copying not more than length bytes. If there is yet more to copy, the MCP copies the next part of the message into the next scatter's buffer, and so on, until the entire message has been copied, or there are not enough buffers--in which case it is an error. There can be at most MYRI_MAX_SCATTERS Scatters per message.
Then the MCP adds an item to the other queue, the Receive Ack queue. The workstation reads the item from the Receive Ack Queue. This item tells him which group ofworkstation buffers has just been filled with a newly received message. The workstation uses that message and then gives the buffer that the message was in back to the MCP by putting it on the Receive Queue. If the Receive Queue is empty, or if the Receive Ack Queue is full, any new messages received by the MCP will be dropped. The receive buffer queue and the receive ack queue can each hold NUM_RECEIVES items at a time.
The workstation adds a scatter to a channel's receive buffer queue with myriApiAddScatter(). myriApiAddScatter() is defined as:
int myriApiAddScatter (int unit,
int channel,
int context,
int numScatters,
MyriGather*scatters);
myriApiAddScatter() returns:
unit and channel are the same as in the send functions. context is an index that will be given back to the workstation as part of the Receive Ack item for this buffer, when a message is received into this buffer. pointer is a pointer to the buffer. This pointer must be a valid DMA pointer.
The workstation recovers a scatter with a message received into it with myriApiGetScatter(), which is defined as:
int myriApiGetScatter (int unit,
int channel,
int*context,
int*checksum,
MyriGather*scatters);
unit and channel are the same as in myriApiGetScatter(). The context of the scatter is passed back to the caller in the context parameter. myriApiGetScatter() returns:
In addition, the length of each message piece is returned in the "length" field of each scatter.
With myriApiGetNumReceiveAcksPending () you can peek at a channel's receive ack queue to see if there are any messages to retrieve.
The Myrinet API is made up by the following files:
mcpConstants.h
--Defines constants like NUM_SENDS.
myriApi.h
--has API function prototypes and the definitions of the Gather and MyriCopy.
libMyriApi.a
Programs using the Myrinet API should link to:
and they should include "myriApi.h".
The Myrinet API has no internal concept of threads. As a programmer using this library you need to prevent multiple threads of control from entering the same function. The general concept is that a process using unit=U and channel=C needs to protect accesses to the various Api functions. Different units and different channels in the same unit do not need to be protected from each other.
A common practice is to have two semaphores for each unit/channel combination. One semaphore protects the send commands. The other semaphore protects receive and initializtion commands.
Here are the API function prototypes, from the header file "myriApi.h".
void myriApiLoadLanai(int unit, unsigned short sts, int mask, unsigned char *myriAddrPtr); int myriApiSetBurst (int unit, int channel, int burst); int myriApiSetInterruptMask (int unit, int channel,int mask); int myriApiSetAddress (int unit, int channel,char address[8]); void myriApiGetAddress (int unit, char address[8]); int myriApiReset (int unit, int channel); int myriApiWasReset (int unit, int channel); int myriApiHandshake (int unit, int channel); int myriApiAddScatter (int unit, int channel, int context, int numScatters, MyriGather*scatters); int myriApiGetScatter (int unit, int channel, int*context,int*checksum,MyriGather*scatters); int myriApiSend (int unit, int channel, char address[8], int destintionChannel, int numGathers, Gather*gathers); int myriApiSendWithChecksum (int unit, int channel,char address[8], int destinationChannel, int numGathers,Gather*gathers,int checksumOffset,int checksumCorrection); int myriApiSendQueueFull (int unit, int channel); int myriApiGetNumSendsPending (int unit, int channel); int myriApiMulticastListen (int unit, int channel,char address[8]); int myriApiMulticastDrop (int unit, int channel, char address[8]); int myriApiMulticastListening (int unit, int channel, char address[8]); int myriApiGetNumChannels (int unit, int channel); int myriApiSetMapLevel (int unit, int channel, unsigned short mapLevel); void myriApiInterruptEnable(int unit); void myriApiInterruptDisable(int unit); int myriApiInterruptPending(int unit); /*these are obsolete and are included for backwards compatibility*/ int myriApiGetReceiveBufferWithChecksum (int unit, int channel, int*context,int*checksum); int myriApiAddReceiveBuffer (int unit, int channel, int context, void*pointer); int myriApiGetReceiveBuffer (int unit, int channel, int*context); |