Relative Jump Offsets In Different Machines
The targets of most jump instructions are usually offsets. But the value of the offsets differ in different architectures. In this article, we will compare some architectures to show the details.
RISC-V
The source code in rv.asm
:
l1: beq x0, x0, l1
l2: beq x0, x0, l1
Compile and disassemble the file:
riscv64-unknown-elf-as rv.asm -o rv.o && riscv64-unknown-elf-objdump -S rv.o
rv.o: file format elf64-littleriscv
Disassembly of section .text:
0000000000000000 <l1>:
0: 00000063 beqz zero,0 <l1>
0000000000000004 <l2>:
4: fe000ee3 beqz zero,0 <l1>
According to RISC-V Instruction Set Manual, The encoding for beq
is:
32 25 20 15 12 7 0
+--------------+-----+-----+-----+-------------+---------+
| imm[12|10:5] | rs2 | rs1 | 000 | imm[4:1|11] | 1100011 |
+--------------+-----+-----+-----+-------------+---------+
So the offsets in the 2 instructions are 0
and signed integer 0b1111111_1110_0
(which is -4
). Which make sense.
ARM
The source code in arm.asm
:
l1: beq l1
l2: beq l1
Compile and disassemble the file:
arm-none-eabi-as arm.asm -o arm.o && arm-none-eabi-objdump -S arm.o
arm.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <l1>:
0: 0afffffe beq 0 <l1>
00000004 <l2>:
4: 0afffffd beq 0 <l1>
According to ARM Architecture Reference Manual, The encoding for beq
is:
32 28 24 0
+------+-------+---+-----------------+
| cond | 1 0 1 | 0 | signed_immed_24 |
+------+-------+---+-----------------+
So the offset is FFFFFE
and FFFFFD
, after sign-extending and shifting, it will be FFFFFFF8
and FFFFFFF4
.
They are signed integer, so they means -8
and -12
.
Why they are not 0
and -4
?
This is for some historical reason:
The original ARM design had a 3-stage pipeline (fetch-decode-execute). To simplify the design they chose to have the PC read as the value currently on the instruction fetch address lines, rather than that of the currently executing instruction from 2 cycles ago. Since most PC-relative addresses are calculated at link time, it's easier to have the assembler/linker compensate for that 2-instruction offset than to design all the logic to 'correct' the PC register.
MCS-51
The source code in 8051.asm
:
l1: jc l1
l2: jc l1
Compile and disassemble the file:
sdas8051 -l 8051.lst 8051.asm && head 8051.lst
ASxxxx Assembler V02.00 + NoICE + SDCC mods (Intel 8051), page 1.
Hexadecimal [24-Bits]
000000 40 FE [24] 1 l1: jc l1
000002 40 FC [24] 2 l2: jc l1
According to 8051 Instruction Set Manual, The encoding for JC
is:
16 8 0
+--------+----------+
| OFFSET | 01000000 |
+--------+----------+
So the OFFSET
s are FE
(-2
) and FC
(-4
).
Like in ARM, we can only get the increased PC
during the execution of an instruction.