Appearance
Compilerbook For macOS arm64
Original reference: https://www.sigbus.info/compilerbook
This repository is a native Apple Silicon/macOS rewrite track for compilerbook. The original book targets Linux x86-64; this track keeps the incremental compiler-building style, but changes the backend to AArch64 assembly assembled and linked by Apple clang.
The important rule for this rewrite: the compiler must not evaluate the input program in C and print a constant result. It should tokenize, parse, build an AST, and generate assembly that computes the result at runtime.
Target
- Host OS: macOS
- CPU: Apple Silicon arm64 / AArch64
- Assembler/linker driver: Apple
clang - Output path: compiler emits
.s, thenclang tmp.s -o tmp - Executable format: Mach-O
- First practical goal: compile small C-like snippets into a runnable program whose exit status is the expected result
Basic workflow
sh
make
./armcc tmp.c > tmp.s
clang tmp.s -o tmp
./tmp
echo $?For early calculator stages, direct source-string input is also useful:
sh
./armcc 'main(){ return 1+2*3; }' > tmp.s
clang tmp.s -o tmp
./tmp
echo $?Expected exit code: 7.
Run the current test set:
sh
make testRun every phase snapshot:
sh
make phase-testImportant differences from the original target
The original project assumes Linux x86-64. On macOS arm64, adjust these areas:
- Use AArch64 instructions instead of x86-64 instructions.
- Use Apple arm64 calling convention.
- Prefix C symbols with
_, such as_mainand_printf. - Keep the stack 16-byte aligned.
- Use Mach-O section names and relocation syntax.
- Use Apple
clangas the assembler/linker driver. - Do not rely on Linux ELF details.
- Do not rely on Linux-style static linking.
Minimal generated program
The smallest useful generated assembly is:
asm
.globl _main
_main:
mov x0, #42
retBuild and run:
sh
clang tmp.s -o tmp
./tmp
echo $?Expected exit code: 42.
Apple arm64 ABI notes
General-purpose registers:
x0tox7: integer/pointer argumentsx0: integer/pointer return valuex29: frame pointerx30: link registersp: stack pointer
Rules to respect:
- Stack alignment at call boundaries must be 16 bytes.
- Function calls use
bl _function_name. - Return uses
ret. - C-visible symbols generally need a leading underscore.
Example call:
asm
mov x0, #3
mov x1, #4
bl _addThe result is in x0.
Function frame
Once local variables or nested calls exist, use a normal frame:
asm
.globl _main
_main:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #32
mov x0, #42
mov sp, x29
ldp x29, x30, [sp], #16
retLocal variables can live at negative offsets from x29:
asm
str x0, [x29, #-8]
ldr x0, [x29, #-8]Round the local stack area up to a multiple of 16.
Stack-machine expression codegen
The compilerbook uses a stack-machine style for early expression codegen. The same idea works on arm64.
Use x0 as the current expression result.
Push:
asm
sub sp, sp, #16
str x0, [sp]Pop:
asm
ldr x1, [sp]
add sp, sp, #16Arithmetic:
asm
add x0, x1, x0
sub x0, x1, x0
mul x0, x1, x0
sdiv x0, x1, x0Comparison:
asm
cmp x1, x0
cset x0, eq
cset x0, ne
cset x0, lt
cset x0, leFor > and >=, either swap operands in the parser/codegen or use the corresponding condition after cmp.
Branches and labels
Use cmp plus conditional branches:
asm
cmp x0, #0
b.eq .L.else.0
...
b .L.end.0
.L.else.0:
...
.L.end.0:Generate unique labels with a monotonically increasing counter.
Function definitions
For a function:
c
int add(int x, int y) {
return x + y;
}Emit:
asm
.globl _add
_add:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16
str x0, [x29, #-8]
str x1, [x29, #-16]
ldr x0, [x29, #-8]
sub sp, sp, #16
str x0, [sp]
ldr x0, [x29, #-16]
ldr x1, [sp]
add sp, sp, #16
add x0, x1, x0
mov sp, x29
ldp x29, x30, [sp], #16
retStore incoming arguments into local stack slots first. This keeps later codegen simple because parameters and local variables are accessed the same way.
Globals
Writable global integer:
asm
.data
.globl _g
_g:
.quad 3Load global address:
asm
adrp x0, _g@PAGE
add x0, x0, _g@PAGEOFFLoad global value:
asm
adrp x0, _g@PAGE
add x0, x0, _g@PAGEOFF
ldr x0, [x0]Store global value:
asm
adrp x1, _g@PAGE
add x1, x1, _g@PAGEOFF
str x0, [x1]String literals
String literals should go in a Mach-O string section:
asm
.section __TEXT,__cstring
.L.str.0:
.asciz "hello"Load string address:
asm
adrp x0, .L.str.0@PAGE
add x0, x0, .L.str.0@PAGEOFFThen switch back to text before functions:
asm
.textSuggested phase map
The phase directories intentionally track compilerbook's step granularity more closely than a conventional project milestone plan.
phase-01-int: integer literal, e.g.42phase-02-add-sub:5+20-4phase-03-tokenizer: tokenizer and whitespace handlingphase-04-errors: source-location errorsphase-05-mul-div-parens:*,/, precedence, parenthesesphase-06-unary: unary+and-phase-07-comparisons:==,!=,<,<=,>,>=phase-08-file-split: split compiler source filesphase-09-single-letter-locals: one-letter local variablesphase-10-multiletter-locals: multi-letter local variablesphase-11-return:returnphase-12-control-flow:if,else,while,forphase-13-blocks:{ ... }phase-14-function-calls: function callsphase-15-function-definitions: function definitions and parameters
Next phase: types, pointers, arrays
Add int, pointer types, &, *, pointer arithmetic, arrays, indexing, and sizeof.
Tests:
c
int main() { int x; x=3; int *p; p=&x; return *p; }
int main() { int a[3]; a[0]=3; a[1]=4; return a[0]+a[1]; }
int main() { int a[3]; return sizeof(a); }Later phase: globals and strings
Add global variables, global arrays, and string literals.
Tests:
c
int g; int main() { g=3; return g; }
int g=5; int main() { return g; }
int main() { return *"A"; }
int *s="B"; int main() { return *s; }Later phase: richer file input and C tests
Read source from a file and run a C-snippet test harness.
Test harness shape:
sh
printf '%s\n' "$input" > tmp.c
./cc-arm64 tmp.c > tmp.s
clang tmp.s -o tmp
./tmpCommon macOS arm64 pitfalls
- Forgetting the leading
_on exported C symbols. - Misaligning the stack before
bl. - Treating Mach-O sections like ELF sections.
- Trying to use Linux static-linking assumptions.
- Using x86-64 stack/register examples directly.
- Loading addresses without
@PAGEand@PAGEOFF. - Forgetting that
x30must be preserved when making nested calls. - Letting generated labels collide.
Recommended repository shape
text
Makefile
compilerbook-macos-arm64.md
phases/
phase-01-int/
phase-02-add-sub/
phase-03-tokenizer/
phase-04-errors/
phase-05-mul-div-parens/
phase-06-unary/
phase-07-comparisons/
phase-08-file-split/
phase-09-single-letter-locals/
phase-10-multiletter-locals/
phase-11-return/
phase-12-control-flow/
phase-13-blocks/
phase-14-function-calls/
phase-15-function-definitions/Keep every phase runnable. A phase should be small enough that make phase-test proves its behavior without depending on unfinished later work.