3

Linker notes on AArch64

 1 year ago
source link: https://maskray.me/blog/2023-03-05-linker-notes-on-aarch64
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This article describes target-specific details about AArch64 in ELF linkers. AArch64 is the 64-bit execution environment for the Arm architecture.

ABI documents

Global Offset Table

The Global Offset Table consists of two sections:

  • .got.plt holds code addresses for PLT.
  • .got holds other addresses and offsets.

The symbol _GLOBAL_OFFSET_TABLE_ is defined at the beginning of the .got section. GNU ld reserves a single entry for .got and .got[0] holds the link-time address of _DYNAMIC for a legacy reason Versions of glibc prior to 2.35 have the _DYNAMIC requirement. See All about Global Offset Table.

.got.plt[1] and .got.plt[2] are for lazy binding PLT. Linkers communicate the address of .got.plt to rtld with the dynamic tag DT_PLTGOT.

GOT optimization

See All about Global Offset Table#GOT optimization.

Procedure Linkage Table

The registers x16 (IP0) and x17 (IP1) are the first and second intra-procedure-call temporary registers. They may be used by PLT entries and veneers.

The PLT header looks like:

bti  c       // If BTI
stp x16, x30, [sp,#-16]!
adrp x16, :page: &.got.plt[2]
ldr x17, [x16, :lo12: &.got.plt[2]]
add x16, x16, :lo12: &.got.plt[2]
br x17

The Nth PLT entry looks like:

bti  c       // If BTI
adrp x16, :page: &.got.plt[N + 3]
ldr x17, [x16, :lo12: &.got.plt[N + 3]]
add x16, x16, :lo12: &.got.plt[N + 3]
autia1716 // If PAC-PLT
br x17

When BTI is enabled for the output file, the code sequence starts with bti c. When PAC-PLT is enabled, the code sequence includes autia1716 before br x17.

Relocation optimization

There are a few optimization schemes beside GOT optimization, e.g.

add  x2, x2, 0  // R_<CLS>_ADD_ABS_LO12_NC

=>

nop
adrp x0, :page: symbol
add x0, x0, :lo12: symbol

=>

nop
adr x0, symbol

--no-relax disables the optimization.

See ELF for the Arm® 64-bit Architecture (AArch64)#Relocation optimization.

Thread Local Storage

AArch64 uses a variant of TLS Variant I: the static TLS blocks are placed above the thread pointer. The thread pointer points to the end of the thread control block.

The linker performs TLS optimization.

See All about thread-local storage.

Program Property

A .note.gnu.property section contains program property notes that describe special handling requirements for the linker and the dynamic loader.

The linker parses input .note.gnu.property sections and recognizes command line options -z force-bti and -z pac-plt to compute the output .note.gnu.property (type is SHT_NOTE) section. Without these options, linkers only set the feature bit in the output file if all the input relocatable object files have the corresponding feature set.

for (ELFFileBase *f : ctx.objectFiles) {
uint32_t features = f->andFeatures;
if (!(features & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)) {
if (config->zBtiReport == "error")
error(toString(f) + ": -z bti-report: file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property");
else if (config->zBtiReport == "warning")
warn(toString(f) + ": -z bti-report: file does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI property");
}

if (config->zForceBti && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)) {
if (config->zBtiReport == "none")
warn(toString(f) + ": -z force-bti: file does not have "
"GNU_PROPERTY_AARCH64_FEATURE_1_BTI property");
features |= GNU_PROPERTY_AARCH64_FEATURE_1_BTI;
}
if (config->zPacPlt && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_PAC)) {
warn(toString(f) + ": -z pac-plt: file does not have "
"GNU_PROPERTY_AARCH64_FEATURE_1_PAC property");
features |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
}
ret &= features;
}

Range extension thunks

Function calls typically use B and BL instructions. The two instructions have a range of +/-128MiB and may use 2 relocation types: R_AARCH64_CALL26 and R_AARCH64_JUMP26. If the destination is not reachable by a single B/BL, linkers may insert a veneer (range extension thunk).

-no-pie links may use a thunk with absolute addressing targeting any location in the 64-bit address space.

<caller>:
bl __AArch64AbsLongThunk_nonpreemptible
b __AArch64AbsLongThunk_nonpreemptible

<__AArch64AbsLongThunk_nonpreemptible>:
ldr x16, .+8
br x16

<$d>:
.word 0x00000000
.word 0x00000010

<.plt>:

-pie and -shared links need to use a thunk with PC-relative addressing targeting a range of +/-4GiB.

<caller>:
bl __AArch64ADRPThunk_nonpreemptible
b __AArch64ADRPThunk_nonpreemptible

<__AArch64ADRPThunk_nonpreemptible>:
adrp x16, :page: nonpreemptible
add x16, x16, :lo12: nonpreemptible
br x16

The branch target of a thunk may be a PLT entry:

<caller>:
bl __AArch64ADRPThunk_preemptible

<__AArch64ADRPThunk_preemptible>:
adrp x16, :page: preemptible@plt
add x16, x16, :lo12: preemptible@plt
br x16

...

<preemptible@plt>:
adrp x16, :page: &.got.plt[N + 3]
ldr x17, [x16, :lo12: &.got.plt[N + 3]]
add x16, x16, :lo12: &.got.plt[N + 3]
br x17

--fix-cortex-a53-843419

This option enables a linker workaround for Arm Cortex-A53 Errata 843419. Full details are available in the ARM-EPM-048406 document. In ld.lld this additionally sets a workaround when relocating R_AARCH64_JUMP26.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK