[lld][MachO] Add ARM64e pointer authentication linking support#188378
[lld][MachO] Add ARM64e pointer authentication linking support#188378oskarwirga wants to merge 22 commits intollvm:mainfrom
Conversation
|
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-lld Author: Oskar Wirga (oskarwirga) ChangesThis PR implements ARM64e linking support in lld's Mach-O backend. This is a continuation of work discussed in #79543 and I included @BertalanD's suggestion to keep the Relocation object size small, thanks! This PR is part of several I have planned, but contains everything needed to link arm64e Mach-Os with LLD. What this PR adds
Testing
AI DisclosureI used Claude heavily throughout development, primarily to iterate through linker errors. All generated code was manually reviewed and tested on ARM64e hardware. I feel very confident in these changes being functionally correct because I tested on hardware extensively, architecturally they may be lacking! Patch is 66.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/188378.diff 26 Files Affected:
diff --git a/lld/MachO/Arch/ARM64Common.cpp b/lld/MachO/Arch/ARM64Common.cpp
index 599f1e18efda6..e137cdb5679f5 100644
--- a/lld/MachO/Arch/ARM64Common.cpp
+++ b/lld/MachO/Arch/ARM64Common.cpp
@@ -19,7 +19,8 @@ using namespace lld::macho;
int64_t ARM64Common::getEmbeddedAddend(MemoryBufferRef mb, uint64_t offset,
const relocation_info rel) const {
if (rel.r_type != ARM64_RELOC_UNSIGNED &&
- rel.r_type != ARM64_RELOC_SUBTRACTOR) {
+ rel.r_type != ARM64_RELOC_SUBTRACTOR &&
+ rel.r_type != ARM64_RELOC_AUTHENTICATED_POINTER) {
// All other reloc types should use the ADDEND relocation to store their
// addends.
// TODO(gkm): extract embedded addend just so we can assert that it is 0
@@ -28,6 +29,12 @@ int64_t ARM64Common::getEmbeddedAddend(MemoryBufferRef mb, uint64_t offset,
const auto *buf = reinterpret_cast<const uint8_t *>(mb.getBufferStart());
const uint8_t *loc = buf + offset + rel.r_address;
+
+ if (rel.r_type == ARM64_RELOC_AUTHENTICATED_POINTER) {
+ // Only the low 32 bits are the addend; upper bits hold ptrauth fields.
+ return llvm::SignExtend64<32>(read32le(loc));
+ }
+
switch (rel.r_length) {
case 2:
return static_cast<int32_t>(read32le(loc));
diff --git a/lld/MachO/Arch/ARM64e.cpp b/lld/MachO/Arch/ARM64e.cpp
new file mode 100644
index 0000000000000..be89991950f80
--- /dev/null
+++ b/lld/MachO/Arch/ARM64e.cpp
@@ -0,0 +1,268 @@
+//===- ARM64e.cpp ---------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "Arch/ARM64Common.h"
+#include "InputFiles.h"
+#include "Symbols.h"
+#include "SyntheticSections.h"
+#include "Target.h"
+
+#include "lld/Common/ErrorHandler.h"
+#include "mach-o/compact_unwind_encoding.h"
+#include "llvm/BinaryFormat/MachO.h"
+
+using namespace llvm;
+using namespace llvm::MachO;
+using namespace lld;
+using namespace lld::macho;
+
+namespace {
+
+struct ARM64e : ARM64Common {
+ ARM64e();
+ void writeStub(uint8_t *buf, const Symbol &, uint64_t) const override;
+ void writeStubHelperHeader(uint8_t *buf) const override;
+ void writeStubHelperEntry(uint8_t *buf, const Symbol &,
+ uint64_t entryAddr) const override;
+
+ void writeObjCMsgSendStub(uint8_t *buf, Symbol *sym, uint64_t stubsAddr,
+ uint64_t &stubOffset, uint64_t selrefVA,
+ Symbol *objcMsgSend) const override;
+ void populateThunk(InputSection *thunk, Symbol *funcSym) override;
+
+ void initICFSafeThunkBody(InputSection *thunk,
+ Symbol *targetSym) const override;
+ Symbol *getThunkBranchTarget(InputSection *thunk) const override;
+ uint32_t getICFSafeThunkSize() const override;
+};
+
+} // namespace
+
+// Random notes on reloc types:
+// ADDEND always pairs with BRANCH26, PAGE21, or PAGEOFF12
+// POINTER_TO_GOT: ld64 supports a 4-byte pc-relative form as well as an 8-byte
+// absolute version of this relocation. The semantics of the absolute relocation
+// are weird -- it results in the value of the GOT slot being written, instead
+// of the address. Let's not support it unless we find a real-world use case.
+static constexpr std::array<RelocAttrs, 12> relocAttrsArray{{
+#define B(x) RelocAttrBits::x
+ {"UNSIGNED",
+ B(UNSIGNED) | B(ABSOLUTE) | B(EXTERN) | B(LOCAL) | B(BYTE4) | B(BYTE8)},
+ {"SUBTRACTOR", B(SUBTRAHEND) | B(EXTERN) | B(BYTE4) | B(BYTE8)},
+ {"BRANCH26", B(PCREL) | B(EXTERN) | B(BRANCH) | B(BYTE4)},
+ {"PAGE21", B(PCREL) | B(EXTERN) | B(BYTE4)},
+ {"PAGEOFF12", B(ABSOLUTE) | B(EXTERN) | B(BYTE4)},
+ {"GOT_LOAD_PAGE21", B(PCREL) | B(EXTERN) | B(GOT) | B(BYTE4)},
+ {"GOT_LOAD_PAGEOFF12",
+ B(ABSOLUTE) | B(EXTERN) | B(GOT) | B(LOAD) | B(BYTE4)},
+ {"POINTER_TO_GOT", B(PCREL) | B(EXTERN) | B(GOT) | B(POINTER) | B(BYTE4)},
+ {"TLVP_LOAD_PAGE21", B(PCREL) | B(EXTERN) | B(TLV) | B(BYTE4)},
+ {"TLVP_LOAD_PAGEOFF12",
+ B(ABSOLUTE) | B(EXTERN) | B(TLV) | B(LOAD) | B(BYTE4)},
+ {"ADDEND", B(ADDEND)},
+ // ARM64e-specific: AUTHENTICATED_POINTER (64-bit absolute, external or
+ // local)
+ {"AUTHENTICATED_POINTER",
+ B(ABSOLUTE) | B(UNSIGNED) | B(EXTERN) | B(LOCAL) | B(BYTE8) | B(AUTH)},
+#undef B
+}};
+
+// ARM64e uses authenticated stubs with braa instruction.
+// These are 16 bytes (4 instructions) instead of the regular 12 bytes.
+// The stub computes the GOT address in x17 for use as authentication context.
+static constexpr uint32_t stubCode[] = {
+ 0x90000011, // 00: adrp x17, __auth_got@page
+ 0x91000231, // 04: add x17, x17, __auth_got@pageoff
+ 0xf9400230, // 08: ldr x16, [x17]
+ 0xd71f0a11, // 0c: braa x16, x17 ; authenticate with IA key, context=x17
+};
+
+void ARM64e::writeStub(uint8_t *buf8, const Symbol &sym,
+ uint64_t pointerVA) const {
+ auto *buf32 = reinterpret_cast<uint32_t *>(buf8);
+ constexpr size_t stubCodeSize = 4 * sizeof(uint32_t);
+ SymbolDiagnostic d = {&sym, "stub"};
+ uint64_t stubAddr = in.stubs->addr + sym.stubsIndex * stubCodeSize;
+ uint64_t pcPageBits = pageBits(stubAddr);
+ uint64_t targetPageBits = pageBits(pointerVA);
+ int64_t pageDiff = static_cast<int64_t>(targetPageBits - pcPageBits);
+ // adrp x17, __auth_got@page
+ encodePage21(&buf32[0], d, stubCode[0], pageDiff);
+ // add x17, x17, __auth_got@pageoff
+ encodePageOff12(&buf32[1], d, stubCode[1], pointerVA);
+ // ldr x16, [x17]
+ buf32[2] = stubCode[2];
+ // braa x16, x17
+ buf32[3] = stubCode[3];
+}
+
+static constexpr uint32_t stubHelperHeaderCode[] = {
+ 0x90000011, // 00: adrp x17, _dyld_private@page
+ 0x91000231, // 04: add x17, x17, _dyld_private@pageoff
+ 0xa9bf47f0, // 08: stp x16/x17, [sp, #-16]!
+ 0x90000010, // 0c: adrp x16, dyld_stub_binder@page
+ 0xf9400210, // 10: ldr x16, [x16, dyld_stub_binder@pageoff]
+ 0xd61f0200, // 14: br x16
+};
+
+void ARM64e::writeStubHelperHeader(uint8_t *buf8) const {
+ ::writeStubHelperHeader<LP64>(buf8, stubHelperHeaderCode);
+}
+
+static constexpr uint32_t stubHelperEntryCode[] = {
+ 0x18000050, // 00: ldr w16, l0
+ 0x14000000, // 04: b stubHelperHeader
+ 0x00000000, // 08: l0: .long 0
+};
+
+void ARM64e::writeStubHelperEntry(uint8_t *buf8, const Symbol &sym,
+ uint64_t entryVA) const {
+ ::writeStubHelperEntry(buf8, stubHelperEntryCode, sym, entryVA);
+}
+
+// ARM64e uses authenticated ObjC stubs with braa instruction.
+// Uses x17 as both the address register and authentication context,
+// matching the pattern used in ARM64e auth stubs.
+static constexpr uint32_t objcStubsFastCode[] = {
+ 0x90000001, // adrp x1, __objc_selrefs@page
+ 0xf9400021, // ldr x1, [x1, @selector("foo")@pageoff]
+ 0x90000011, // adrp x17, __auth_got@page
+ 0x91000231, // add x17, x17, __auth_got@pageoff
+ 0xf9400230, // ldr x16, [x17]
+ 0xd71f0a11, // braa x16, x17 ; authenticate with IA key
+ 0xd4200020, // brk #0x1
+ 0xd4200020, // brk #0x1
+};
+
+static constexpr uint32_t objcStubsSmallCode[] = {
+ 0x90000001, // adrp x1, __objc_selrefs@page
+ 0xf9400021, // ldr x1, [x1, @selector("foo")@pageoff]
+ 0x14000000, // b _objc_msgSend
+};
+
+void ARM64e::writeObjCMsgSendStub(uint8_t *buf, Symbol *sym, uint64_t stubsAddr,
+ uint64_t &stubOffset, uint64_t selrefVA,
+ Symbol *objcMsgSend) const {
+ uint64_t objcMsgSendAddr;
+ uint64_t objcStubSize;
+ uint64_t objcMsgSendIndex;
+
+ if (config->objcStubsMode == ObjCStubsMode::fast) {
+ objcStubSize = target->objcStubsFastSize;
+ // ARM64e uses authgot for objc_msgSend.
+ objcMsgSendAddr = in.authgot->addr;
+ objcMsgSendIndex = objcMsgSend->authGotIndex;
+ ::writeObjCMsgSendFastStub<LP64>(buf, objcStubsFastCode, sym, stubsAddr,
+ stubOffset, selrefVA, objcMsgSendAddr,
+ objcMsgSendIndex);
+ } else {
+ assert(config->objcStubsMode == ObjCStubsMode::small);
+ objcStubSize = target->objcStubsSmallSize;
+ if (auto *d = dyn_cast<Defined>(objcMsgSend)) {
+ objcMsgSendAddr = d->getVA();
+ objcMsgSendIndex = 0;
+ } else {
+ objcMsgSendAddr = in.stubs->addr;
+ objcMsgSendIndex = objcMsgSend->stubsIndex;
+ }
+ ::writeObjCMsgSendSmallStub<LP64>(buf, objcStubsSmallCode, sym, stubsAddr,
+ stubOffset, selrefVA, objcMsgSendAddr,
+ objcMsgSendIndex);
+ }
+ stubOffset += objcStubSize;
+}
+
+// A thunk is the relaxed variation of stubCode. We don't need the
+// extra indirection through a lazy pointer because the target address
+// is known at link time.
+static constexpr uint32_t thunkCode[] = {
+ 0x90000010, // 00: adrp x16, <thunk.ptr>@page
+ 0x91000210, // 04: add x16, [x16,<thunk.ptr>@pageoff]
+ 0xd61f0200, // 08: br x16
+};
+
+void ARM64e::populateThunk(InputSection *thunk, Symbol *funcSym) {
+ thunk->align = 4;
+ thunk->data = {reinterpret_cast<const uint8_t *>(thunkCode),
+ sizeof(thunkCode)};
+ thunk->relocs.emplace_back(/*type=*/ARM64_RELOC_PAGEOFF12,
+ /*pcrel=*/false, /*length=*/2,
+ /*offset=*/4, /*addend=*/0,
+ /*referent=*/funcSym);
+ thunk->relocs.emplace_back(/*type=*/ARM64_RELOC_PAGE21,
+ /*pcrel=*/true, /*length=*/2,
+ /*offset=*/0, /*addend=*/0,
+ /*referent=*/funcSym);
+}
+
+// Just a single direct branch to the target function.
+static constexpr uint32_t icfSafeThunkCode[] = {
+ 0x14000000, // 00: b target
+};
+
+void ARM64e::initICFSafeThunkBody(InputSection *thunk,
+ Symbol *targetSym) const {
+ // The base data here will not be itself modified, we'll just be adding a
+ // reloc below. So we can directly use the constexpr above as the data.
+ thunk->data = {reinterpret_cast<const uint8_t *>(icfSafeThunkCode),
+ sizeof(icfSafeThunkCode)};
+
+ thunk->relocs.emplace_back(/*type=*/ARM64_RELOC_BRANCH26,
+ /*pcrel=*/true, /*length=*/2,
+ /*offset=*/0, /*addend=*/0,
+ /*referent=*/targetSym);
+}
+
+Symbol *ARM64e::getThunkBranchTarget(InputSection *thunk) const {
+ assert(thunk->relocs.size() == 1 &&
+ "expected a single reloc on ARM64 ICF thunk");
+ auto &reloc = thunk->relocs[0];
+ assert(isa<Symbol *>(reloc.referent) &&
+ "ARM64 thunk reloc is expected to point to a Symbol");
+
+ return cast<Symbol *>(reloc.referent);
+}
+
+uint32_t ARM64e::getICFSafeThunkSize() const {
+ return sizeof(icfSafeThunkCode);
+}
+
+ARM64e::ARM64e() : ARM64Common(LP64()) {
+ cpuType = CPU_TYPE_ARM64;
+ // ARM64e-specific: Use ARM64E subtype with pointer authentication ABI version 0
+ cpuSubtype = CPU_SUBTYPE_ARM64E_WITH_PTRAUTH_VERSION(/*version*/ 0,
+ /*kernel*/ false);
+
+ stubSize = sizeof(stubCode);
+ thunkSize = sizeof(thunkCode);
+
+ objcStubsFastSize = sizeof(objcStubsFastCode);
+ objcStubsFastAlignment = 32;
+ objcStubsSmallSize = sizeof(objcStubsSmallCode);
+ objcStubsSmallAlignment = 4;
+
+ // Branch immediate is two's complement 26 bits, which is implicitly
+ // multiplied by 4 (since all functions are 4-aligned: The branch range
+ // is -4*(2**(26-1))..4*(2**(26-1) - 1).
+ backwardBranchRange = 128 * 1024 * 1024;
+ forwardBranchRange = backwardBranchRange - 4;
+
+ modeDwarfEncoding = UNWIND_ARM64_MODE_DWARF;
+ subtractorRelocType = ARM64_RELOC_SUBTRACTOR;
+ unsignedRelocType = ARM64_RELOC_UNSIGNED;
+
+ stubHelperHeaderSize = sizeof(stubHelperHeaderCode);
+ stubHelperEntrySize = sizeof(stubHelperEntryCode);
+
+ relocAttrs = {relocAttrsArray.data(), relocAttrsArray.size()};
+}
+
+TargetInfo *macho::createARM64eTargetInfo() {
+ static ARM64e t;
+ return &t;
+}
diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt
index 72631f11511bf..e8afdc06609e9 100644
--- a/lld/MachO/CMakeLists.txt
+++ b/lld/MachO/CMakeLists.txt
@@ -6,6 +6,7 @@ include_directories(${LLVM_MAIN_SRC_DIR}/../libunwind/include)
add_lld_library(lldMachO
Arch/ARM64.cpp
+ Arch/ARM64e.cpp
Arch/ARM64Common.cpp
Arch/ARM64_32.cpp
Arch/X86_64.cpp
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index 58fbe64c2d1f9..8b85a7ae33d37 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -951,6 +951,10 @@ static TargetInfo *createTargetInfo(InputArgList &args) {
case CPU_TYPE_X86_64:
return createX86_64TargetInfo();
case CPU_TYPE_ARM64:
+ if (cpuSubtype == CPU_SUBTYPE_ARM64E ||
+ cpuSubtype == CPU_SUBTYPE_ARM64E_VERSIONED_PTRAUTH_ABI_MASK ||
+ cpuSubtype == CPU_SUBTYPE_ARM64E_WITH_PTRAUTH_VERSION(0, 0))
+ return createARM64eTargetInfo();
return createARM64TargetInfo();
case CPU_TYPE_ARM64_32:
return createARM64_32TargetInfo();
@@ -1254,7 +1258,8 @@ static bool shouldEmitChainedFixups(const InputArgList &args) {
return false;
}
- if (!is_contained({AK_x86_64, AK_x86_64h, AK_arm64}, config->arch())) {
+ if (!is_contained({AK_x86_64, AK_x86_64h, AK_arm64, AK_arm64e},
+ config->arch())) {
if (requested)
error("-fixup_chains is only supported on x86_64 and arm64 targets");
@@ -1975,6 +1980,8 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
config->emitDataInCodeInfo =
args.hasFlag(OPT_data_in_code_info, OPT_no_data_in_code_info, true);
config->emitChainedFixups = shouldEmitChainedFixups(args);
+ if (config->arch() == AK_arm64e && !config->emitChainedFixups)
+ error("arm64e requires chained fixups; cannot use -no_fixup_chains");
config->emitInitOffsets =
config->emitChainedFixups || args.hasArg(OPT_init_offsets);
config->emitRelativeMethodLists = shouldEmitRelativeMethodLists(args);
diff --git a/lld/MachO/InputFiles.cpp b/lld/MachO/InputFiles.cpp
index cc7eae51175bc..e2bb43c3e3598 100644
--- a/lld/MachO/InputFiles.cpp
+++ b/lld/MachO/InputFiles.cpp
@@ -201,6 +201,15 @@ static bool compatWithTargetArch(const InputFile *file, const Header *hdr) {
return false;
}
+ // Reject arm64 objects when linking for arm64e.
+ if (config->arch() == AK_arm64e &&
+ hdr->cputype == CPU_TYPE_ARM64 &&
+ (hdr->cpusubtype & ~CPU_SUBTYPE_MASK) == CPU_SUBTYPE_ARM64_ALL) {
+ warn(toString(file) + " has architecture arm64 which is incompatible with "
+ "target architecture arm64e (arm64e requires pointer authentication)");
+ return false;
+ }
+
return checkCompatibility(file);
}
@@ -263,7 +272,7 @@ std::optional<MemoryBufferRef> macho::readFile(StringRef path) {
// FIXME: LD64 has a more complex fallback logic here.
// Consider implementing that as well?
if (cpuType != static_cast<uint32_t>(target->cpuType) ||
- cpuSubtype != target->cpuSubtype) {
+ cpuSubtype != (target->cpuSubtype & ~MachO::CPU_SUBTYPE_MASK)) {
archs.emplace_back(getArchName(cpuType, cpuSubtype));
continue;
}
@@ -580,9 +589,27 @@ void ObjFile::parseRelocations(ArrayRef<SectionHeader> sectionHeaders,
r.pcrel = relInfo.r_pcrel;
r.length = relInfo.r_length;
r.offset = relInfo.r_address;
+
+ // For ARM64e authenticated pointer relocations, extract the auth info
+ // (diversity, key, addrDiv) from the upper bits of the raw pointer value
+ // and store them in the union's authData member.
+ if (target->hasAttr(relInfo.r_type, RelocAttrBits::AUTH)) {
+ const uint8_t *loc = buf + sec.offset + relInfo.r_address;
+ uint64_t raw = read64le(loc);
+ // The auth bit (bit 63) should be set for authenticated pointers
+ if ((raw >> 63) & 1) {
+ r.hasAuth = true;
+ r.authData.addend = isSubtrahend ? 0 : static_cast<int32_t>(totalAddend);
+ r.authData.info.diversity = (raw >> 32) & 0xFFFF;
+ r.authData.info.addrDiv = (raw >> 48) & 0x1;
+ r.authData.info.key = (raw >> 49) & 0x3;
+ }
+ }
+
if (relInfo.r_extern) {
r.referent = symbols[relInfo.r_symbolnum];
- r.addend = isSubtrahend ? 0 : totalAddend;
+ if (!r.hasAuth)
+ r.addend = isSubtrahend ? 0 : totalAddend;
} else {
assert(!isSubtrahend);
const SectionHeader &referentSecHead =
diff --git a/lld/MachO/InputSection.cpp b/lld/MachO/InputSection.cpp
index 34847adc85954..04dd0ded61b52 100644
--- a/lld/MachO/InputSection.cpp
+++ b/lld/MachO/InputSection.cpp
@@ -101,7 +101,14 @@ static uint64_t resolveSymbolOffsetVA(const Symbol *sym, uint8_t type,
// There's no meaningful way to "interpose" an interior offset.
symVA = (offset != 0) ? sym->getVA() : sym->resolveBranchVA();
} else if (relocAttrs.hasAttr(RelocAttrBits::GOT)) {
- symVA = sym->resolveGotVA();
+ // GOT_LOAD (no POINTER attr) should use regular __got when available,
+ // because the compiler applies paciza on the loaded value and needs
+ // a raw (non-auth) pointer. POINTER_TO_GOT (has POINTER attr) should
+ // use __auth_got (the default from getGotVA/resolveGotVA).
+ if (!relocAttrs.hasAttr(RelocAttrBits::POINTER) && sym->isInGot())
+ symVA = in.got->getVA(sym->gotIndex);
+ else
+ symVA = sym->resolveGotVA();
} else if (relocAttrs.hasAttr(RelocAttrBits::TLV)) {
symVA = sym->resolveTlvVA();
} else {
@@ -253,7 +260,7 @@ void ConcatInputSection::writeTo(uint8_t *buf) {
target->handleDtraceReloc(referentSym, r, loc);
continue;
}
- referentVA = resolveSymbolOffsetVA(referentSym, r.type, r.addend);
+ referentVA = resolveSymbolOffsetVA(referentSym, r.type, r.getAddend());
if (isThreadLocalVariables(getFlags()) && isa<Defined>(referentSym)) {
// References from thread-local variable sections are treated as offsets
@@ -262,15 +269,22 @@ void ConcatInputSection::writeTo(uint8_t *buf) {
// contiguous).
referentVA -= firstTLVDataSection->addr;
} else if (needsFixup) {
- writeChainedFixup(loc, referentSym, r.addend);
+ writeChainedFixup(loc, referentSym, r);
continue;
}
} else if (auto *referentIsec = r.referent.dyn_cast<InputSection *>()) {
assert(!::shouldOmitFromOutput(referentIsec));
- referentVA = referentIsec->getVA(r.addend);
+ referentVA = referentIsec->getVA(r.getAddend());
if (needsFixup) {
- writeChainedRebase(loc, referentVA);
+ AuthInfo aiStorage;
+ const AuthInfo *ai = nullptr;
+ if (r.hasAuth) {
+ aiStorage = r.getAuthInfo();
+ ai = &aiStorage;
+ }
+ uint64_t segmentBase = referentIsec->parent->parent->addr;
+ writeChainedRebase(loc, referentVA, segmentBase, ai);
continue;
}
}
diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h
index e0a90a2edc0af..ac23b9a9b4e57 100644
--- a/lld/MachO/InputSection.h
+++ b/lld/MachO/InputSection.h
@@ -363,6 +363,7 @@ constexpr const char staticInit[] = "__StaticInit";
constexpr const char stringTable[] = "__string_table";
constexpr const char stubHelper[] = "__stub_helper";
constexpr const char stubs[] = "__stubs";
+constexpr const char authStubs[] = "__auth_stubs";
constexpr const char swift[] = "__swift";
constexpr const char symbolTable[] = "__symbol_table";
constexpr const char textCoalNt[] = "__textcoal_nt";
diff --git a/lld/MachO/MapFile.cpp b/lld/MachO/MapFile.cpp
index 29ebcdcf9a832..352a47a83355f 100644
--- a/lld/MachO/MapFile.cpp
+++ b/lld/MachO/MapFile.cpp
@@ -149,7 +149,9 @@ static void printNonLazyPointerSection(raw_fd_ostream &os,
// associations.
for (const Symbol *sym : osec->getEntries())
os << format("0x%08llX\t0x%08zX\t[ 0] non-lazy-pointer-to-local: %s\n",
- osec->addr + sym->gotIndex * target->wordSize,
+ osec->addr + (osec->getIsAuth() ? sym->authGotIndex
+ : sym->...
[truncated]
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
ojhunt
left a comment
There was a problem hiding this comment.
Rather than the various masking and shift operations to pull out of the various constants, I think should switch to struct defs with correct bitfields instead. But if we retain shifts, the literals should be replaced with named constants.
There was a problem hiding this comment.
Just verifying lld style allows auto instead of a named type? I'd certainly prefer a type here, but I don't know if it's required
Also verifying lowercase initial letter is correct naming style?
Both of these exist throughout this PR but I'll just comment/question here
There was a problem hiding this comment.
This specific instance in lld/MachO/Arch/ARM64e.cpp is copied over from lld/MachO/Arch/ARM64.cpp and I didn't want the two to diverge too much.
In general, I tried to keep auto's inline with the LLD Mach-O writing style which also uses lowercase initial letter. I don't feel strongly about this so I am happy to change it or leave it for a future NFC.
There was a problem hiding this comment.
Using this for converting values is UB.
Hahaha, I joke (it is UB) using memcpy to do this is miserable
There was a problem hiding this comment.
TIL! I modified lld/MachO/ICF.cpp to not touch the raw addend so now we should be avoiding potential for reading the raw union.
There was a problem hiding this comment.
so now we should be avoiding potential for reading the raw union.
Would it be beneficial to make this union private to force all future users to use getAddend() instead of addend?
There was a problem hiding this comment.
Yes but I'd prefer saving it for a later follow-up because there's a lot of .addend callsites and this PR already is quite unwieldy and I've already promised a few other NFCs to follow.
There was a problem hiding this comment.
Might be worth to break up this PR into smaller ones. It might also help the review.
There was a problem hiding this comment.
Might be worth to break up this PR into smaller ones.
I am not opposed to the idea but these changes are quite interdependent and decomposing this would result in functionally dead code, at least for the first couple PRs
There was a problem hiding this comment.
Might be worth to break up this PR into smaller ones. It might also help the review.
Its become a lot more sprawling and might be too ambitious in one go atp. Any suggestions on if/how I should split this up?
There was a problem hiding this comment.
Not big ones. You probably have a better picture of the code than me.
There's things that are obvious refactors that should be able to be broken up in earlier PRs. For example adding getAddend() and changing all usages of r.addend. There's probably others that are similar to those.
There was a problem hiding this comment.
I'd kind of prefer InStruct (I think that's what in is but it took a while to track this down) to handle this with something like InStruct::AddGOTEntry(sim, isauthgot), but that implies a structural change outside of the scope of this PR
There was a problem hiding this comment.
Yeah that would be cleaner, again I think I can start working on a separate NFC for this stuff.
There was a problem hiding this comment.
so now we should be avoiding potential for reading the raw union.
Would it be beneficial to make this union private to force all future users to use getAddend() instead of addend?
drodriguez
left a comment
There was a problem hiding this comment.
Skipped some of the bigger files for the moment. Most of my comments for the moment might be considered minor or nit.
There was a problem hiding this comment.
Might be worth to break up this PR into smaller ones. It might also help the review.
…rAuth (#189474) This is part of work being done in #188378 and #188638, split out from #188650. `lowerConstantPtrAuth` silently miscompiled `ConstantPtrAuth` constants with non-`GlobalValue` pointer bases — emitting `0@AUTH(da,0)` instead of erroring. Changes: - Handle `ConstantPointerNull` bases explicitly - Error via `reportFatalUsageError` on any remaining unresolved base (e.g. nested ptrauth) instead of silently miscompiling This PR was mostly developed with LLM assistance
|
@rjmccall @ahmedbougacha @ahatanak @lgerbarg could some of you folk look at this to see if it's reasonable? it's outside my domain and I have ~negative amounts of time to look for at least another week or two |
There was a problem hiding this comment.
Have we changed this value before? What kind of testing/benchmarking is expected before we can change this value? At least, it would be good to verify that runtime and memory usage doesn't increase too much for a few builds.
There was a problem hiding this comment.
I wouldn't expect to see significant impact from this given 56 byte allocation would probably take the same amount of memory as 64, but out of curiosity as well I ran a benchmark linking LLD before and after these changes & got no noticeable regression. It was all within range of noise on my macbook.
There was a problem hiding this comment.
The alignment of uint64_t is 8 bytes, the alignment of uint32_t is 4 bytes, so those 56 bytes were aligned, and it would not have taken 64 bytes (in theory… I haven't measure the alignment before and after). It is still packed as much as possible, but it is a 14% increase in memory for the symbol structures. Also both any architecture is paying the price for something that only works in arm64e.
It is also bad that you added a single uint32_t authGotIndex, but the size jumped 8 bytes. The layout of the structure might not be optimal for the data it contains (from what I can see, there's a pointer after all those uint32_t, which probably became unaligned with this change).
|
Sorry. I just merged #191808, which modifies some of the functions in |
There was a problem hiding this comment.
The alignment of uint64_t is 8 bytes, the alignment of uint32_t is 4 bytes, so those 56 bytes were aligned, and it would not have taken 64 bytes (in theory… I haven't measure the alignment before and after). It is still packed as much as possible, but it is a 14% increase in memory for the symbol structures. Also both any architecture is paying the price for something that only works in arm64e.
It is also bad that you added a single uint32_t authGotIndex, but the size jumped 8 bytes. The layout of the structure might not be optimal for the data it contains (from what I can see, there's a pointer after all those uint32_t, which probably became unaligned with this change).
Add struct definitions for the arm64e chained fixup pointer formats used by dyld. These correspond to the DYLD_CHAINED_PTR_ARM64E and DYLD_CHAINED_PTR_ARM64E_USERLAND24 format types already enumerated in this header. The structs encode rebase, bind, and authenticated bind/rebase fixup entries, and will be consumed by the arm64e chained fixup encoding implementation in LLD. Layouts match Apple's dyld fixup-chains.h definitions.
Add data structures to support arm64e pointer authentication in the MachO linker. The AUTH relocation attribute marks relocations that carry ptrauth metadata (key, diversity, address discrimination). The Relocation struct's addend slot is overlaid with an AuthReloc union member (32-bit addend + AuthInfo) to keep sizeof(Relocation) unchanged. Also adds the __auth_stubs section name constant.
Add the ARM64e target to LLD's MachO linker, enabling linking of arm64e binaries with pointer authentication support. Key components: - ARM64e.cpp: New target with 16-byte authenticated stubs using braa (authenticated branch) with x17 as the context register, plus authenticated ObjC stubs, thunks, and ICF safe thunks - ARM64Common.cpp: Handle ARM64_RELOC_AUTHENTICATED_POINTER in getEmbeddedAddend (low 32 bits only) and relocateOne - Driver.cpp: Route arm64e CPU subtypes to the new target, add arm64e to chained fixups support - InputFiles.cpp: Parse ptrauth metadata (key, diversity, addrDiv) from high bits of authenticated pointer relocations
Add the authenticated GOT (__auth_got) section for arm64e pointer authentication. On arm64e, symbols accessed through authenticated pointers need a separate GOT section whose entries are signed with address-diversified PAC. The __auth_got is placed before __got in section ordering so the linker processes authenticated entries first. This commit adds the infrastructure: AuthGotSection class, authgot pointer in InStruct, auth-aware writeChainedRebase/writeChainedFixup signatures, and the forceOutline parameter for bindings that need full auth encoding. Population and routing are in follow-up commits.
Route symbols to the authenticated GOT (__auth_got) when arm64e pointer authentication requires it. Symbols that are called (BRANCH relocations) need stubs, and arm64e stubs use authgot for braa-based authenticated jumps. A pre-scan of relocations identifies these symbols so that subsequent GOT_LOAD relocations for the same symbols are also routed to authgot. Also routes personality pointers in unwind info to authgot on arm64e, since exception handling needs authenticated personality function pointers.
…port Implement the chained fixup encoding for arm64e using the DYLD_CHAINED_PTR_ARM64E_USERLAND24 format, and add dual GOT support for symbols that need both authenticated and raw GOT entries. Chained fixup encoding: - Auth rebases use dyld_chained_ptr_arm64e_auth_rebase with PAC key/diversity/addrDiv fields - Auth binds use dyld_chained_ptr_arm64e_auth_bind24 (24-bit ordinal) - Non-auth arm64e fixups use corresponding non-auth structs - Handles authEncodingBits for C++ RTTI pointers in UNSIGNED relocs Dual GOT routing: - Symbols with both BRANCH and GOT_LOAD relocations need different GOT entries: BRANCH -> __auth_got (braa), GOT_LOAD -> __got (paciza) - Adds authGotIndex alongside gotIndex in Symbol - GOT_LOAD resolves through __got to prevent double-signing (Crash 7)
Replace bare shift/mask operations in InputFiles.cpp with a arm64e_auth_embedded_pointer bitfield struct defined in MachO.h. This matches the pattern used by the dyld_chained_ptr_arm64e_* structs for chained fixup encoding.
Compare auth metadata (diversity, key, addrDiv) explicitly in equalsConstant() so that sections with different signing schemas are never folded together. Use getAddend() instead of raw union access to avoid reading the wrong union member.
Extract complex expressions into local variables: - MapFile: GOT index ternary moved to a local before the format call - Writer: reinterpret_cast results in buildFixupChains stored in locals
ARM64E userland formats use 8-byte stride, not 4. Replace the hardcoded stride constant with a per-architecture value and use it uniformly in both the stride check and the next-field encoding.
- Driver.cpp: Fix cpuSubtype check to match any arm64e variant using mask instead of listing specific values - Driver.cpp: Move arm64e chained fixups requirement into shouldEmitChainedFixups(); warn and force-enable instead of erroring, matching ld64 behavior - InputFiles.cpp: Check != CPU_SUBTYPE_ARM64E instead of == CPU_SUBTYPE_ARM64_ALL to also reject CPU_SUBTYPE_ARM64_V8 - Relocations.h: Add static_assert for AuthReloc size - Update arm64e-no-fixup-chains.s test for warn behavior
- Use sizeof(stubCode) instead of manual 4 * sizeof(uint32_t) - Deduplicate stub helper, thunk, and ICF code shared between ARM64 and ARM64e by moving to ARM64Common - Change getAuthInfo() to return const AuthInfo* (nullptr when !hasAuth) to simplify callers in InputSection, SyntheticSections, and ICF - Use sizeof(int64_t) in AuthReloc static_assert - Reuse getAuthGotVA() from getGotVA() - Extract addPersonalityGotEntry() helper in UnwindInfoSection
…p writers writeChainedRebase, writeChainedFixup, and the per-reloc loop in ConcatInputSection::writeTo all passed segmentBase that was never read (the writers normalize against in.header->addr instead). Removes one parent->parent->addr traversal per relocation in the hot path.
The static constexpr arrays in ARM64Common.h were duplicated across each of the 5 translation units that include the header. inline constexpr gives them external linkage with the C++17 inline-variable rule, so there is exactly one copy in the final binary.
Replaces the '0' magic in NonLazyPointerSectionBase::writeTo with a named PtrAuthKey::IA constant. The enum lives next to AuthInfo in Relocations.h since LLVMAArch64Utils' AArch64PACKey is not in lld's link set.
…GotIndex getAuthGotVA already asserts isInAuthGot(); routing the StubsSection and UnwindInfoSection personality lookups through it makes the precondition checked. ARM64e::writeObjCMsgSendStub still needs the raw index but now asserts isInAuthGot() before reading.
…ation) The high8 byte and the target field are two halves of one logical value that dyld OR-folds at load time. In USERLAND24 the unified value is the runtime offset (targetValue), not the VA, so high8 must be extracted from targetValue to keep both halves consistent. In practice Mach-O image bases never set bits [56:63], so the discrepancy is unobservable at runtime; the fix is for self-consistency. Add a smoke test that verifies the high8 byte round-trips through the linker into the rebase entry in both USERLAND24 and legacy DYLD_CHAINED_PTR_ARM64E pointer formats.
The Relocation union overlays AuthReloc {int32_t addend; AuthInfo info;}
on the 64-bit addend slot. The non-extern (section-relative) AUTH path
in ObjFile::parseRelocations was writing the full 64-bit r.addend after
auth decoding, silently zeroing diversity/key/addrDiv.
Use authData.addend (the int32_t half) when r.hasAuth so the auth
metadata survives, mirroring what the extern path already does.
Adding authGotIndex pushed Symbol from 56 to 64 bytes — the 6th uint32_t forced a 4-byte pad before nameData (a pointer needing 8-byte alignment). Pack the Symbol::Kind enum into 3 bits in the bitfield region (the enum has 7 values; LLVM_PREFERRED_TYPE preserves type-safety via the kind() accessor) so symbolKind shares its byte with isUsedInRegularObj and used. Restores the 56-byte target for all architectures, including non-arm64e which doesn't read authGotIndex but still incurred its alignment cost. Defined drops correspondingly from 96 to 88 bytes.
|
I force-pushed after rebasing but RE: #188378 (comment) @drodriguez I was able to keep the symbol size the same by packing kind as 3 bits + reordering the members |
…rAuth (llvm#189474) This is part of work being done in llvm#188378 and llvm#188638, split out from llvm#188650. `lowerConstantPtrAuth` silently miscompiled `ConstantPtrAuth` constants with non-`GlobalValue` pointer bases — emitting `0@AUTH(da,0)` instead of erroring. Changes: - Handle `ConstantPointerNull` bases explicitly - Error via `reportFatalUsageError` on any remaining unresolved base (e.g. nested ptrauth) instead of silently miscompiling This PR was mostly developed with LLM assistance
…rAuth (llvm#189474) This is part of work being done in llvm#188378 and llvm#188638, split out from llvm#188650. `lowerConstantPtrAuth` silently miscompiled `ConstantPtrAuth` constants with non-`GlobalValue` pointer bases — emitting `0@AUTH(da,0)` instead of erroring. Changes: - Handle `ConstantPointerNull` bases explicitly - Error via `reportFatalUsageError` on any remaining unresolved base (e.g. nested ptrauth) instead of silently miscompiling This PR was mostly developed with LLM assistance
drodriguez
left a comment
There was a problem hiding this comment.
It is better to use merges instead of force pushes, specially for PRs with a lot of commits. Now it is very difficult to find what might have changed in the older commits.
Some more nits and details. The ones that are easy to find looking at the code. Hopefully other people can see other things and understand better the PR holistically than me.
There was a problem hiding this comment.
Not big ones. You probably have a better picture of the code than me.
There's things that are obvious refactors that should be able to be broken up in earlier PRs. For example adding getAddend() and changing all usages of r.addend. There's probably others that are similar to those.
…access Adds Relocation::setAddend() as the symmetric pair to the existing getAddend(), and migrates all callers of the raw .addend field to use the accessors. While auditing for this change, found four r.addend = 0 writes that would clobber AuthInfo on AUTH relocations (the addend lives in a union overlaid on the upper 32 bits when hasAuth is true): - UnwindInfoSection.cpp: personality reloc (auth-relevant on arm64e) - ObjC.cpp: template-copied reloc inherits hasAuth - ConcatOutputSection.cpp: BRANCH26 paths (defensive) - InputFiles.cpp: eh_frame SUBTRACTOR/UNSIGNED pair (defensive) These writes now go through setAddend() which dispatches based on r.hasAuth.
- AuthInfo::key is now PtrAuthKey directly instead of uint8_t. The cast moves from defaultAuthInfo to the parse site in InputFiles.cpp and the three dyld-bitfield writes in writeChainedRebase / writeChainedBind, where the dyld struct uses a uint64_t-backed bit-field that requires uint8_t. - defaultAuthInfo is now static constexpr instead of static const. - writeChainedRebase's arm64e non-auth path now reports a hard error if the encoded target value cannot round-trip through the 43-bit target field plus 8-bit high8. The existing recommendation to disable -no_fixup_chains does not apply since Driver.cpp rejects that flag for arm64e.
drodriguez
left a comment
There was a problem hiding this comment.
Last two commits deal with the last bits of feedback I got. Thanks!
This PR implements ARM64e linking support in lld's Mach-O backend. This is a continuation of work discussed in #79543 and I included @BertalanD's suggestion to keep the Relocation object size small, thanks! This PR is part of several I have planned, but contains everything needed to link arm64e Mach-Os with LLD.
What this PR adds
sizeof(Relocation)at 24 bytes.__auth_gotsection for signed function pointers alongside the existing unsigned __got. Stubs load from__auth_got; address-taken data references use__got.__auth_got; personality pointers in unwind info route to__auth_got; ObjC message send fast stubs use__auth_gotfor objc_msgSend lookup withbraaauthenticationTesting
AI Disclosure
I used Claude heavily throughout development, primarily to iterate through linker errors. All generated code was manually reviewed and tested on ARM64e hardware. I feel very confident in these changes being functionally correct because I tested on hardware extensively, architecturally they may be lacking!