The instructions of the LANai 3.0 processor are defined in terms of the pipeline, and the processor does not attempt to preserve the sequential semantics of the source assembly program. No data or control hazards stall the pipeline, only denied memory accesses do.
The unit of execution of the processor is the "time step", which takes
as long as necessary for all four pipeline stages to complete, and, at
least conceptually, all processor-state changes occur instantaneously,
in between time steps. In absence of memory contention, each pipeline
stage takes one clock cycle (see the LANai 3.0 specification for
memory-arbitration details), except for punt, which takes two
clock cycles.
A program can specify multiple values to be written to a single destination register during a timestep. Such conflicts are resolved as if the register updates occured in the following order, in between time steps:
pc is incremented by 4.
ps register are updated, if specified by the
instruction being executed.
The order of activities within a time step is not defined. In particular, if during a time step two memory accesses are required (instruction and data), they can (and do) take place in any order.
nopsThe program:
ld 0xC[%r8],%r7 ! R7 <- Memory( R8 + 0xC ) /* RM */ add %r7,%r9,%r8 ! R8 <- R7 + R9 /* RR */
is not equivalent to the program:
ld 0xC[%r8],%r7 ! R7 <- Memory( R8 + 0xC ) /* RM */ nop ! R0 <- R0 + 0 /* a RI no-op */ add %r7,%r9,%r8 ! R8 <- R7 + R9 /* RR */
The meaning of the instructions is defined in terms of the pipeline. For the first program, the execution would be performed as follows:
Time: 0 1 2 3 4
+-------+-------+
Iaddr: | 0| 4|
+-------+-------+-------+
Fetch: | RM | RR |
+-------+-------+-------+
Compute: | ea | RR |
+-------+-------+
Memory: | r/w |
+-------+
such that R7 is modified by the RM instruction at time 4,
and the R7 referred to in the RR instruction is its
contents before it is modified by the RM instruction. In the
execution of the second program:
Time: 0 1 2 3 4
+-------+-------+-------+
Iaddr: | 0| 4| 8|
+-------+-------+-------+-------+
Fetch: | RM | nop | RR |
+-------+-------+-------+-------+
Compute: | ea | nop | RR |
+-------+-------+-------+
Memory: | r/w |
+-------+
the R7 referred to in the RR instruction is its contents
after it is modified by the RM instruction.
pc (Program Counter)
Incrementing of the current pc is an implicit operation that
fetches the value of the pc at the beginning of a time step and
modifies the pc at the end of the time step. Programs that
explicitly modify the pc override this implicit operation. In
either case, the value of the pc at the end of a time step
is used to fetch the instruction during the following time step.
For example, the following program executes the shadow and then jumps to address 304:
0: bt 304 ! pc <- 304 /* BR */ 4: mov %r25,%r24 ! R24 <- R25 /* RR */ /* the shadow */
The behavior of the preceeding code is depicted in the following diagram:
pc written here ... ... and that written value of pc
| | is used to fetch the NEW
v v instruction here.
+-------+-------+-------+-------+
pc: |X 0|0 4|4 304|304 308| ...
+-------+-------+-------+-------+
Iaddr: | 0| 4| 304| 308| ...
+-------+-------+-------+-------+-------+
Fetch: | BR | RR | NEW | ... |
+-------+-------+-------+-------+-------+
Compute: | BR | RR | ... | ... |
+-------+-------+-------+-------+
In another example, the program
0: ld [28],%pc ! /*RM*/ pc <- Memory(28) /*here Memory(28) = 304*/ 4: mov %pc,%r27 ! /*RR*/ r27 <- pc /*these instructions...*/ 8: add %r26,3,%r26 ! /*RI*/ R26 <- R26 + 3 /*...will be executed. */
executes the two shadows and then jumps to address 304:
pc is written here... ... and that value of pc
| | is used to fetch an
V V instruction here.
+-------+-------+-------+-------+-------+
pc: |X 0|0 4|4 8|8 304|304 308| ...
+-------+-------+-------+-------+-------+
Iaddr: | 0| 4| 8| 304| 308| ...
+-------+-------+-------+-------+-------+-------+
Fetch: | RM | RR | RI | NEW | ... | ...
+-------+-------+-------+-------+-------+-------+
Compute: | ea | RR | RI | ... | ... |
+-------+-------+-------+-------+-------+
Memory: | read |
+-------+
Although this processor does not directly support conditional branches to
addresses greater than
(2^25)-4,
conditional branches to any memory location can be effected by taking
the advantage of the property that memory loads into registers are
overridden if they conflict with other writes to the same register. For
example, the following relocatable code conditionally jumps to the
address in memory location 304 if the C flag in the %ps
register is clear by loading the address into the pc and then
conditionally overriding the load with a branch instruction:
0: ld [304],%pc ! /*RM*/ pc <- Memory(304) 4: bcs.r 4 ! /*BR*/ pc <-conditionally- pc + 4 8: nop ! /*RI*/ does nothing