next up previous
Next: 10. Leaf Procedures on Up: A Laboratory Manual for Previous: 8. The ISEM Graphics

Subsections



9. The SPARC Instruction Formats

9.1 Goal

To cover the instruction encoding and decoding for the SPARC.

9.2 Objectives

After completing this lab, you will be able to:

9.3 Discussion

In this lab we consider instruction encoding and decoding for the operations that we have introduced in previous labs. In particular, we will consider encodings for instructions that use the data manipulation and branching operations. After we introduce instruction encoding, we consider the translation of synthetic operations. Finally, we conclude this lab by considering instruction decoding on the SPARC.

All SPARC instructions are encoded in a single 32-bit instruction word, there are no extension words.

9.3.1 Encoding load and store instructions

The SPARC machine language uses two different formats for load and store instructions. These formats are shown in Figure 9.1. The first format is used for instructions that use one or two registers in the effective address. The second format is used for instructions that use an integer constant in the effective address.

Figure 9.1: Instruction formats for load and store instructions
\begin{figure}\begin{center}\small\setlength{\unitlength}{1.2\unitlength} A. Ins...
...e(0,1){10}}
\put(190,0){\line(0,1){10}}
\end{picture} \end{center} \end{figure}

In the first format the 32-bit instruction is divided into seven fields. The first field (reading from the left) holds the 2-bit value 11, while the fifth field (bit 13) holds the 1-bit value 0. These bits are the same for all load and store instructions that use two source registers. The sixth field (bits 5 through 12) holds the address space indicator, asi. For the present, we will always set the asi field to zero. The remaining fields, rd, op${}_{3}$, rs${}_1$, and rs${}_2$, hold encodings for the destination register, the operation, and the two source registers, respectively.

Registers are encoded using the 5-bit binary representation of the register number. Table 9.1 summarizes the operation encodings for the load and store operations.



Table 9.1: Operation encodings for the load and store operations
Operation op${}_{3}$ Operation op${}_{3}$
ld 000000 st 000100
ldub 000001 stb 000101
lduh 000010 sth 000110
ldd 000011 std 000111
ldsb 001001
ldsh 001010


Example: Hand assemble the instruction:
ldd     [%%r4+%r7], %r11

Because this instruction uses two registers in the address specification, it is encoded using the first format shown in Figure 9.1. As such, we must determine the values for the rd, op${}_{3}$, rs${}_1$, and rs${}_2$ fields. The following table summarizes these encodings:

        
Field Symbolic value Encoded value
rd %r11 01011
op${}_{3}$ ldd 000011
rs${}_1$ %r4 00100
rs${}_2$ %r7 00111

These encodings lead to the following machine instruction:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...,1){10}}
\put(190,0){\line(0,1){10}}
\put(270,0){\line(0,1){10}}
\end{picture}

That is, 1101 0110 0001 1001 0000 0000 0000 0111 in binary, or 0xD6190007.


If the assembly language instruction only uses a single register in the address specification (e.g., register indirect addressing), the register is encoded in one of the source register fields (i.e., sr${}_1$ or sr${}_2$) while %r0 is encoded in the other. It doesn't matter which field holds the register specified in the assembly language instruction and which field holds the encoding for %r0. However, isem-as encodes %r0 in sr${}_2$.


Example: Hand assemble the instruction:
ldub     [%%r23], %r19

This instruction can be encoded using either of the formats shown in Figure 9.1. We will encode it using the first of these formaTS. As such, we must determine the values for the rd, op${}_{3}$, rs${}_1$, and rs${}_2$ fields. The following table summarizes these encodings:

        
Field Symbolic value Encoded value
rd %r19 10011
op${}_{3}$ ldub 000001
rs${}_1$ %r23 10111
rs${}_2$ %r0 00000

These encodings lead to the following machine instruction:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...,1){10}}
\put(190,0){\line(0,1){10}}
\put(270,0){\line(0,1){10}}
\end{picture}

That is, 1110 0110 0000 1101 1100 0000 0000 0000 in binary, or 0xE60DC000.


In the second format the 32-bit instruction is divided into six fields. As in the previous format, the first field holds the 2-bit value 11. However, unlike the previous format, the fifth field holds the 1-bit value 1. The remaining fields, rd, op${}_{3}$, rs${}_1$, and siconst${}_{13}$, hold encodings for the destination register, the operation, the source register, and the constant value, respectively. When this format is used, the integer constant is encoded using the 13-bit 2's complement representation and stored in the siconst${}_{13}$ field of the instruction.

9.3.2 Encoding sethi instructions

The format used to encode sethi instructions is shown in Figure 9.2. Sethi instructions are encoded in four fields. The first field holds the 2-bit value 00. The next field, rd, holds the 5-bit encoding of the destination register. The third field holds the 3-bit value 100. The final filed holds the 22-bit binary encoding of the value specified in the instruction.

Figure 9.2: Instruction format for sethi instructions
\begin{figure}\begin{center}\small\setlength{\unitlength}{1.2\unitlength} \begin...
...e(0,1){10}}
\put(110,0){\line(0,1){10}}
\end{picture} \end{center} \end{figure}


Example: Hand assemble the instruction:
sethi   %%hi(0x87654321), %r2

This instruction is encoded using the format shown in Figure 9.2. As such, we need to determined the values for the rd and const${}_{22}$ fields. The following table summarizes these encodings:

        
Field Symbolic value Encoded value
rd %r2 00010
const${}_{22}$ %hi(0x87654321) 1000 0111 0110 0101 0100 00

These encodings lead to the following machine instruction:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...0,1){10}}
\put(70,0){\line(0,1){10}}
\put(110,0){\line(0,1){10}}
\end{picture}

That is, 0000 0101 0010 0001 1101 1001 0101 0000 in binary, or 0x0521D950.


9.3.3 Encoding integer data manipulation instructions

Data manipulation instructions are encoded using two formats: one for instructions that use two source registers and another for instructions that use a source register and a small integer constant. The formats used for integer data manipulation instructions are shown in Figure 9.3

Figure 9.3: Instruction formats for data manipulation instructions
\begin{figure}\begin{center}\small\setlength{\unitlength}{1.2\unitlength} A. Ins...
...e(0,1){10}}
\put(190,0){\line(0,1){10}}
\end{picture} \end{center} \end{figure}

In the first format the 32-bit instruction is divided into seven fields. The first field (reading from the left) holds the 2-bit value 10, while the fifth field (bit 13) holds the 1-bit value 0. These bits are the same for all data manipulation instructions that use two source registers. The sixth field (bits 5 through 12) is unused-the bits in this field must be zero. The remaining fields, rd, op${}_{3}$, rs${}_{1}$, and rs${}_{2}$, hold encodings for the destination register, the operation, and the two source registers, respectively

In the second format the 32-bit instruction is divided into six fields. As in the previous format, the first field holds the 2-bit value 01. However, unlike the previous format, the fifth field holds the 1-bit value 1. The remaining fields, rd, op${}_{3}$, rs${}_{1}$, and siconst${}_{13}$, hold encodings for the destination register, the operation, the source register, and the constant value, respectively. When this format is used, the integer constant is encoded using the 13-bit 2's complement representation and stored in the siconst${}_{13}$ field of the instruction.

Recall that a SPARC assembly language instruction begins with the name of the operation, followed by the two source operands, followed by the destination operand. In considering the translation from an assembly language instruction into machine language, there are a few points to keep in mind:

Table 9.2 summarizes the operation encodings for the data manipulation operations that we have covered in the previous labs. When an instruction using one of these operations is encoded, the operator encoding is placed in the op${}_{3}$ field of the machine instruction.



Table 9.2: Operation encodings for the data manipulation operations
Operation op${}_{3}$ Operation op${}_{3}$
add 000000 addcc 010000
and 000001 andcc 010001
andn 000101 andncc 010101
or 000010 orcc 010010
orn 000110 orncc 010110
udiv 001110 udivcc 011110
umul 001010 umulcc 011010
smul 001011 smulcc 011011
sdiv 001111 sdivcc 011111
sub 000100 subcc 010100
xor 000011 xorcc 010011
xnor 000111 xnorcc 010111
sll 100101
srl 100110
sra 100111


Example: Hand assemble the instruction:
sub     %%r16, %r26, %r27

Because this instruction uses two source registers, it is encoded using the first format shown in Figure 9.3. As such, we must determine the values for the op${}_{3}$, rd, rs${}_{1}$, and rs${}_{2}$ fields. The following table summarizes these encodings:

        
Field Symbolic value Encoded value
rd %r27 11011
op${}_{3}$ sub 000100
rs${}_{1}$ %r16 10000
rs${}_{2}$ %r26 11010

These encodings lead to the following machine instruction:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...,1){10}}
\put(190,0){\line(0,1){10}}
\put(270,0){\line(0,1){10}}
\end{picture}

That is, 1011 0110 0010 0100 0000 0000 0001 1010 in binary, or 0xB624001A.



Example: Hand assemble the instruction:
smulcc  %%r29, -23, %r19

Because this instruction uses one source register and a signed integer constant, it is encoded using the second format shown in Figure 9.3. As such, we must determine the values for the op${}_{3}$, rd, rs${}_1$, and siconst${}_{13}$ fields. The following table summarizes these encodings:

        
Field Symbolic value Encoded value
rd %r19 10011
op${}_{3}$ smulcc 011011
rs${}_1$ %r29 11101
siconst${}_{13}$ $-$23 1111 1111 0100 1

These encodings lead to the following machine instruction:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...,1){10}}
\put(180,0){\line(0,1){10}}
\put(190,0){\line(0,1){10}}
\end{picture}

That is, 1010 0110 1101 1111 0111 1111 1110 1001 in binary, or 0xA6DF7FE9.


9.3.4 Encoding conditional branching instructions

The machine language format for the conditional branching operations on the SPARC is shown in Figure 9.4. This format divides the machine instruction into five fields. The first and fourth fields hold the fixed values 00 and 010. The remaining fields, a, cond, and disp${}_{22}$, hold the encoded values for the annul bit, the branching condition, and program counter displacement.

Figure 9.4: Instruction format for conditional branch instructions
\begin{figure}\begin{center}\small\setlength{\unitlength}{1.2\unitlength} \begin...
...e(0,1){10}}
\put(110,0){\line(0,1){10}}
\end{picture} \end{center} \end{figure}

The a field of a machine instruction is set (i.e., 1) for instructions that use the annul suffix (``,a''). This field is clear (i.e, 0) for conditional branching instructions that do not nullify the results of the next instruction. The cond field of a machine instruction encodes the condition under which the branch is taken. Table 9.3 summarizes the operation encodings for the branching operations supported by the SPARC.



Table 9.3: Operation encodings for the conditional branching operations
Operation cond Operation cond
ba 1000 bn 0000
bne (bnz) 1001 be (bz) 0001
bg 1010 ble 0010
bge 1011 bl 0011
bgu 1100 bleu 0100
bcc (bgeu) 1101 bcs (blu) 0101
bpos 1110 bneg 0110
bvc 1111 bvs 0111

To complete the encoding of an assembly language instruction that uses conditional branching, you need to determine the value of the disp${}_{22}$ field. We address this issue by considering how a processor uses this value. When the processor determines that the branching condition is satisfied, it multiplies the value in the disp${}_{22}$ field by 4 and adds it to the program counter (PC). To be more precise, the processor sign extends the 22-bit value stored in the disp${}_{22}$ field to 30 bits and concatenates two zeros to construct a 32-bit which which it adds to the PC. In effect, the disp${}_{22}$ field holds the distance from the target to the destination measured in instructions.


Example: Hand assemble the branch instruction in the following SPARC code fragment.
        cmp     %%r2, 8
        bne     l1
        nop
        inc     %%r3
l1:

In this case, the target is 3 instructions from the branch instruction, so the disp${}_{22}$ field will be the 22-bit binary encoding of 3.

        
Field Symbolic value Encoded value
a   0
cond bne 1001
disp${}_22$ l1 0000 0000 0000 0000 0000 11

These encodings lead to the following machine instruction:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...0,1){10}}
\put(70,0){\line(0,1){10}}
\put(110,0){\line(0,1){10}}
\end{picture}

That is, 0001 0010 1000 0000 0000 0000 0000 0011 in binary, or 0x12800003.



Example: Hand assemble the branch instruction in the following SPARC code fragment.
top:    add     %%r2, %r3, %r2
        deccc   %%r4
        bne     top

In this case, the target is 2 instructions (back) from the branch instruction, so the disp${}_{22}$ field will be the 22-bit binary encoding of $-2$.

        
Field Symbolic value Encoded value
a   0
cond bne 1001
disp${}_22$ l1 1111 1111 1111 1111 1111 10

These encodings lead to the following machine instruction:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...0,1){10}}
\put(70,0){\line(0,1){10}}
\put(110,0){\line(0,1){10}}
\end{picture}

That is, 0001 0010 1011 1111 1111 1111 1111 1110 in binary, or 0x12BFFFFE.


9.3.5 Synthetic Instructions

In most cases, an assembly language instruction is simply a symbolic representation of a machine language instruction. The SPARC architecture also defines a number of assembly language instructions that do not correspond directly to SPARC machine language instructions. These are called synthetic instructions. The assembler translates synthetic instructions into one or more machine language instructions. Using synthetic instructions can frequently make your programs easier to read. Table 9.4 summarizes the translation provided by the assembler for most of the synthetic instructions on the SPARC.



Table 9.4: The synthetic instructions
Synthetic instruction Implementation
bclr rs, rd andn rd, rs, rd
bclr rs, siconst${}_{13}$ andn rs, siconst${}_{13}$, rd
bset rs, rd or rd, rs, rd
bset siconst${}_{13}$, rd or rd, siconst${}_{13}$, rd
btst rs${}_1$, rs${}_2$ andcc rs${}_1$, rs${}_2$, %g0
btst rs, siconst${}_{13}$ andcc rs, siconst${}_{13}$, %g0
btog rs, rd xor rd, rs, rd
btog rs, siconst${}_{13}$ xor rs, siconst${}_{13}$, rd
clr rd or %g0, %g0, rd
clrb [address] stb %g0, [address]
clrh [address] sth %g0, [address]
clr [address] st %g0, [address]
cmp rs${}_1$, rs${}_2$ subcc rs${}_1$, rs${}_2$, %g0
cmp rs, siconst${}_{13}$ subcc rs, siconst${}_{13}$, %g0
dec rd sub rd, 1, rd
dec siconst${}_{13}$, rd sub rd, siconst${}_{13}$, rd
deccc rd subcc rd, 1, rd
deccc siconst${}_{13}$, rd subcc rd, siconst${}_{13}$, rd
inc rd add rd, 1, rd
inc siconst${}_{13}$, rd add rd, siconst${}_{13}$, rd
inccc rd addcc rd, 1, rd
inccc siconst${}_{13}$, rd addcc rd, siconst${}_{13}$, rd
mov rs, rd or %g0, rs, rd
mov siconst${}_{13}$, rd or %g0, siconst${}_{13}$, rd
mov statereg, rd rd statereg, rd
mov rs, statereg wr %g0, rs, statereg
mov siconst${}_{13}$, statereg wr %g0, siconst${}_{13}$, statereg
neg rs, rd sub %g0, rs, rd
neg rd sub %g0, rd, rd
not rd xnor rd, %g0, rd
not rs, rd xnor rs, %g0, rd
set iconst, rd or %g0, iconst, rd
--or--
sethi %hi(iconst), rd
--or--
sethi %hi(iconst), rd
or rd, %lo(iconst), rd
tst rs orcc %g0, rs, %g0

Most of the translations shown in Table 9.4 are straightforward. However, the implementation of the set instruction merits further discussion. The assembler will always try to use one of the first two translations if it can. That is, if the constant value can be represented in 13 bits, the assembler will select the first translation. If the least significant 10 bits of the constant value are 0, it will used the second translation. Otherwise, the assembler will use the third translation. Note, if the constant value is relocatable, the assembler will always select the third translation.

9.3.6 The read and write instructions

The %y register, introduced in Lab 4 is one of the SPARC state registers. As shown in Table 9.4, when you use a state register as the destination in a mov instruction, it is translated to a wr (write) instruction. Similarly, when you use a state register as the source register in a mov instruction it is translated to a rd (read) instruction.

Write instructions are encoded using the formats shown in Figure 9.3. When the destination register is the %y register, the rd field is set to the 5-bit value 00000 and the op${}_3$ field is set to the 6-bit value 110000.

Read instructions are encoded using the second format shown in Figure 9.3. When the source register is the %y register, the op${}_3$ field is set to the 6-bit value 101000, the rs${}_1$ field is set to the 5-bit value 00000, and the siconst${}_13$ is set to 0000000000000.

9.3.7 Relocatable expressions

In this lab, we have limited our discussion to the translation of instructions that use absolute expressions. We will consider the translation of relocatable expressions when we consider linking and loading in Lab 15.

9.3.8 Decoding SPARC instructions

We conclude our discussion of instruction formats by considering instruction decoding. That is, the process by which a SPARC processor determines the instruction it is executing.

The SPARC uses a distributed opcode. The two most significant bits in an instruction represent the primary opcode. If the primary opcode is 00, bits 22-24 of the instruction provide the secondary opcode. If the primary opcode is 01, the instruction is a call instruction and the remaining bits (bits 0-29) are a displacement for the program counter (we will discuss the call instruction at greater length in Lab 10). Otherwise, if the primary opcode is either 10 or 11, bits 19-24 of the instruction provide the secondary opcode. Figure 9.5 illustrates the positions of the secondary opcodes based on the primary opcode.

Figure 9.5: The primary opcode ina SPARC instruction
\begin{figure}\begin{center}\small\setlength{\unitlength}{1.2\unitlength} \begin...
...e(0,1){10}}
\put(130,0){\line(0,1){10}}
\end{picture} \end{center} \end{figure}

Once you have determined the primary and secondary opcodes, you'll be able to to determined the instruction and, knowing the instruction, decode the remaining fields of the instruction. If the primary opcode is 01, the instruction is a call instruction and you can easily complete the decoding of the instruction.

If the primary opcode is 00, the instruction is an unimplemented instruction, a conditional branch instruction, or a sethi instruction. Table 9.5 summarizes how the 3-bit value in op${}_2$ is used to identify the instruction.



Table 9.5: Decoding the op${}_2$ field
Value Instruction
000 The unimplemented instruction
001 illegal
010 Conditional branch--integer unit
011 illegal
100 SETHI
101 illegal
110 Conditional branch--floating point unit
111 Conditional branch--coprocessor

The data manipulation instructions are encoded with a primary opcode of 10. Table 9.6 shows how the 6-bit value in the op${}_3$ field is used to determine the instruction when the primary opcode is 10.



Table 9.6: Decoding the op${}_3$ field when the primary opcode is 10
000xxx 001xxx 010xxx 011xxx 100xxx 101xxx 110xxx 111xxx
xxx000 add addx addcc addxcc taddcc rd wr jmpl
xxx001 and -- andcc -- tsubcc rd wr rett
xxx010 or umul orcc umulcc taddcctv rd wr trap
xxx011 xor smul xorcc smulcc tsubcctv rd wr flush
xxx100 sub subx subcc subxcc mulscc -- FPU op save
xxx101 andn -- andncc -- sll -- FPU op restore
xxx110 orn udiv orncc udivcc srl -- CP op --
xxx111 xnor sdiv xnorcc sdivcc sra -- CP op --

Instructions that access memory are encoded with a primary opcode of 11. Table 9.7 shows how the 6-bit value in the op${}_3$ field is used to determine the instruction when the primary opcode is 11.



Table 9.7: Decoding the op${}_3$ field when the primary opcode is 11
000xxx 001xxx 010xxx 011xxx 100xxx 101xxx 110xxx 111xxx
xxx000 ld -- lda -- ldf -- ldc --
xxx001 ldub ldsb lduba ldsba ldfsr -- ldcsr --
xxx010 lduh ldsh lduha ldsha -- -- -- --
xxx011 ldd -- ldda -- lddf -- lddc --
xxx100 st -- sta -- stf -- stc --
xxx101 stb ldstub stba ldstuba stfsr -- stcsr --
xxx110 sth -- stha -- stdfq -- scdfq --
xxx111 std swap stda swapa stdf -- scdf --

When you decode an instrcution that has a primary opcode of 10 or 11, you will need to examine bit 13 to determine whether bits 0-12 of the instruction hold an immediate value or a register. If bit 13 is 1, bits 0-12 hold an immediate value.


Example: Give an instruction that will assemble to the value 0x09012345.

In binary, this instruction is 00 00100 100 000100.... That is, the primary opcode is 00 and op${}_2$ is 100. From Table 9.5, this is a sethi instruction. Using the sethi format to partition the bits yields:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...0,1){10}}
\put(70,0){\line(0,1){10}}
\put(110,0){\line(0,1){10}}
\end{picture}

Thus, the destination register is %r4, and the integer constant is 0x12345. The following instruction will be assembled as 0x09012345.

sethi   %%hi(0x12345<<10), %r4


Example: Give an instruction that will assemble to the value 0x10800006.

In binary, this instruction is 00 01000 010 000000.... That is, the primary opcode is 00 and op${}_2$ is 010. From Table 9.5, this is a conditional branch instruction. Using the conditional branch format to partition the bits yields:

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...0,1){10}}
\put(70,0){\line(0,1){10}}
\put(110,0){\line(0,1){10}}
\end{picture}

Thus, the operator is ``ba'' and the displacement is +6 words. The following instruction will be assembled as 0x10800006.

ba      .+(6*4)
(When you use isem-as, `.' is the address of the current instruction.


Example: Give an instruction that will assemble to the value 0x8601600E

In binary, the instruction is 10 00011 000000 0001.... That is, the primary opcode is 10 and op${}_3$ is 000000. From Table 9.6, this is an add instruction. Because bit 13 is 1, we use the second format in Figure 9.3 to decode this instruction.

        
\begin{picture}(320,20)
\put(0,0){\framebox (320,10){}}
\par\footnotesize \put(...
...,1){10}}
\put(180,0){\line(0,1){10}}
\put(190,0){\line(0,1){10}}
\end{picture}

Thus, the destination is %r3, the source register is %r5, and the constant is 0xE. The following instruction will be assembled as 0x8601600E.

add     %%r5, 14, %r3

9.4 Summary

9.5 Review Questions

9.6 Exercises


next up previous
Next: 10. Leaf Procedures on Up: A Laboratory Manual for Previous: 8. The ISEM Graphics
Wayne Heym
2003-12-19