Relative Jump Offsets In Different Machines

The targets of most jump instructions are usually offsets. But the value of the offsets differ in different architectures. In this article, we will compare some architectures to show the details.

RISC-V

The source code in rv.asm:

l1: beq x0, x0, l1
l2: beq x0, x0, l1

Compile and disassemble the file:

riscv64-unknown-elf-as rv.asm -o rv.o && riscv64-unknown-elf-objdump -S rv.o
rv.o:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000000 <l1>:
   0:   00000063                beqz    zero,0 <l1>

0000000000000004 <l2>:
   4:   fe000ee3                beqz    zero,0 <l1>

According to RISC-V Instruction Set Manual, The encoding for beq is:

32             25    20    15    12             7         0
 +--------------+-----+-----+-----+-------------+---------+
 | imm[12|10:5] | rs2 | rs1 | 000 | imm[4:1|11] | 1100011 |
 +--------------+-----+-----+-----+-------------+---------+

So the offsets in the 2 instructions are 0 and signed integer 0b1111111_1110_0(which is -4). Which make sense.

ARM

The source code in arm.asm:

l1: beq l1
l2: beq l1

Compile and disassemble the file:

arm-none-eabi-as arm.asm -o arm.o && arm-none-eabi-objdump -S arm.o
arm.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <l1>:
   0:   0afffffe        beq     0 <l1>

00000004 <l2>:
   4:   0afffffd        beq     0 <l1>

According to ARM Architecture Reference Manual, The encoding for beq is:

32     28          24                 0
 +------+-------+---+-----------------+
 | cond | 1 0 1 | 0 | signed_immed_24 |
 +------+-------+---+-----------------+

So the offset is FFFFFE and FFFFFD, after sign-extending and shifting, it will be FFFFFFF8 and FFFFFFF4. They are signed integer, so they means -8 and -12.

Why they are not 0 and -4?

This is for some historical reason:

The original ARM design had a 3-stage pipeline (fetch-decode-execute). To simplify the design they chose to have the PC read as the value currently on the instruction fetch address lines, rather than that of the currently executing instruction from 2 cycles ago. Since most PC-relative addresses are calculated at link time, it's easier to have the assembler/linker compensate for that 2-instruction offset than to design all the logic to 'correct' the PC register.

MCS-51

The source code in 8051.asm:

l1: jc l1
l2: jc l1

Compile and disassemble the file:

sdas8051 -l 8051.lst 8051.asm && head 8051.lst
ASxxxx Assembler V02.00 + NoICE + SDCC mods  (Intel 8051), page 1.
Hexadecimal [24-Bits]



      000000 40 FE            [24]    1 l1:     jc l1
      000002 40 FC            [24]    2 l2:     jc l1

According to 8051 Instruction Set Manual, The encoding for JC is:

16        8          0
 +--------+----------+
 | OFFSET | 01000000 | 
 +--------+----------+

So the OFFSETs are FE(-2) and FC(-4).

Like in ARM, we can only get the increased PC during the execution of an instruction.