x86_64汇编代码生成：AST处理的行业标准方式及复杂节点实现

阿华AIGC实验室

2026-5-19

Great question—building a code generator for x86_64 from an AST is such a fun (and sometimes tricky) part of compiler writing! Let’s break down your questions with industry-standard practices and practical examples.

Industry-Standard Approaches to AST-to-Code Generation

Most production compilers (like GCC, Clang) use multi-pass processing for code generation, especially when targeting architectures with strict calling conventions and stack alignment rules like x86_64. Here’s why and how it works:

Multi-Pass Processing: Why It’s Preferred

Single-pass code generation is possible for simple languages, but multi-pass is far more maintainable and robust for complex code. The two most critical passes for your use case are:

Pass 1: Stack Frame Calculation & Symbol Resolution
Traverse the AST once to:
- Count all local variables (including temporary storage needed for nested function calls, intermediate values) and function parameters.
- Calculate the total stack frame size required for each function (remember x86_64 requires 16-byte stack alignment at function entry and call sites!).
- Assign fixed stack offsets (relative to rbp, e.g., -8, -16) to every local variable and temporary. This avoids dynamic stack adjustments during code generation, which are error-prone and inefficient.
Pass 2: Instruction Generation
Traverse the AST again, using the stack layout from Pass 1 to emit x86_64 assembly. For example:
- When entering a funcDef, emit prologue code: push rbp; mov rbp, rsp; sub rsp, [calculated_stack_size] to set up the stack frame.
- When exiting, emit the epilogue: mov rsp, rbp; pop rbp; ret.

Handling Nested `funcCall` Nodes (Parameters That Are Function Calls)

This is a common scenario, and the key is to follow x86_64 calling conventions strictly. Let’s use the System V AMD64 ABI (used by Linux, macOS, BSD) as an example—adjust if you’re targeting Microsoft’s x64 convention:

The System V ABI uses registers rdi, rsi, rdx, rcx, r8, r9 for the first 6 integer/pointer parameters; any additional parameters are pushed onto the stack (in reverse order). Return values for integers/pointers go in rax.

For a call like foo(bar(), baz()), here’s the step-by-step code generation logic:

Recursively generate code for bar(): This will leave bar()’s return value in rax.
Save rax to a pre-allocated temporary stack slot: Because calling baz() will overwrite rax, we need to preserve bar()’s result.
Recursively generate code for baz(): Its return value is now in rax.
Move baz()’s result to the correct parameter register: Since it’s the second argument to foo(), we move rax to rsi.
Retrieve bar()’s result from the temporary slot: Move it to rdi (the first argument register for foo()).
Emit the call to foo(): call foo.
Clean up any stack-allocated parameters (if applicable) per the calling convention.

For deeper nesting (e.g., foo(bar(baz()), qux())), you just extend this pattern: always resolve the innermost function calls first, save their results as needed, then build up the outer call’s parameters in the correct order.

Key Additional Tips

Stick to your chosen calling convention: Mixing conventions will break calls to standard library functions or other compiled code. Double-check rules for stack alignment, register preservation (some registers like rbx, rbp, r12-r15 must be preserved by the callee), and return value handling.
Use depth-first traversal (DFS) for AST processing: This naturally handles nested nodes like function call parameters, since you process child nodes (inner calls) before parent nodes (outer calls).
Pre-allocate temporary storage: Don’t dynamically adjust the stack pointer during code generation—all temporary slots should be accounted for in Pass 1’s stack frame calculation.

内容的提问来源于stack exchange，提问作者C. Carey