The Mica backend does not try to make the emitter clever at every single point of code generation. It takes a different route: emit correct assembly with uniform rules first, then clean up the local waste in a dedicated pass.
That final cleanup stage is the peephole optimizer. It runs after x86-64 assembly has been emitted and before the final text is handed off to the assembler. The optimizer is deliberately conservative. It works on short instruction windows, applies only rewrites that are locally defensible, and stops as soon as control flow or aliasing uncertainty makes a transformation questionable.
If you want the full compiler context around this stage, start with The Mica Compiler — A Technical Portrait. This article zooms in on the optimizer itself.
Why Mica Uses a Peephole Pass
Mica’s code generator is designed around clarity and predictable lowering. That keeps the emitter manageable, but it also creates recurring local patterns:
- values staged through stack temporaries even when a direct form would do
- loads that immediately reuse a value that is already available in a register
- call setup code that stores to a temp only to reload the same value into an argument register
- stack adjustments and save/restore pairs that become unnecessary after earlier rewrites
Trying to bake every one of those cases directly into the emitter would make the backend harder to reason about. The peephole pass keeps the division of labor clean:
- the emitter focuses on correctness and uniform lowering
- the optimizer removes the obvious local waste that falls out of that strategy
- later global optimization work can be built on top without entangling the emitter in special cases
This is why the pass is intentionally local. It is not trying to perform full SSA-based optimization, loop analysis, or interprocedural reasoning. It is there to make already-correct machine code smaller, cleaner, and less noisy.
How The Current Optimizer Is Structured
In Mica 4.5, the optimizer applies 17 passes and repeats them until an entire iteration produces no further changes. That fixed-point style matters because one local rewrite often creates another:
- forwarding can create redundant
movinstructions - propagation can make earlier temp stores dead
- stack or compare cleanup can expose new adjacent patterns
The passes are arranged in three groups.
Group 1: Temp Materialization And Operand Folding
This first group attacks the “store to temp, load from temp, use value” patterns that appear naturally in a uniform emitter.
| Pass | Purpose |
|---|---|
| Literal propagation | Collapse immediate-through-temp chains |
| Memory copy propagation | Redirect later loads back to the original source |
| String descriptor forwarding | Bypass temporary reloads for 16-byte string descriptors |
| Arithmetic temp folding | Feed arithmetic directly from memory, immediates, or constant-pool entries |
| Boolean temp folding | Fold byte-sized boolean temporaries into their consumers |
Group 2: Redundancy Cleanup And Call-Adjacent Rewrites
This is the largest group. It removes round-trips that are semantically harmless but mechanically expensive.
| Pass | Purpose |
|---|---|
| Redundant register move elimination | Remove mov Rx, Rx style no-ops |
| Call argument forwarding | Skip temp staging when a value only feeds an ABI argument register |
| Call return forwarding | Reuse return registers instead of storing to a temp and reloading |
| Adjacent store-load forwarding | Forward a value across a strictly adjacent store/load pair |
| Windowed store-load forwarding | Do the same across a short bounded gap |
| Redundant load elimination | Reuse an earlier register value instead of reloading memory |
| Load-store forwarding | Remove pointless write-backs of values that were just loaded |
| Dead store elimination | Remove stores that are overwritten before any read |
Group 3: Compare And Stack Cleanup
The last group handles the small structural artifacts that remain after the value-flow rewrites have simplified the stream.
| Pass | Purpose |
|---|---|
| Compare temp folding | Fold temp loads directly into cmp or test consumers |
| Compare cleanup | Remove back-to-back redundant flag-setting instructions |
| Stack adjust folding | Combine adjacent add/sub rsp updates |
| Push/pop elimination | Remove adjacent push/pop pairs with no net effect |
Representative Rewrites
The easiest way to understand the optimizer is to look at the kind of assembly it shortens.
Literal Propagation
Uniform lowering often stages a literal through a temporary location before writing it to the real destination:
; Before
mov [temp], 42
mov rax, [temp]
mov [dest], rax
; After
mov [dest], 42
The point is not that the emitter is “wrong”. The point is that the peephole pass can remove the detour once the detour is visible in concrete assembly.
Store-Load Forwarding
If a value is stored and immediately reloaded from the exact same address, the second instruction does not need memory at all:
; Before
mov [rbp-8], rax
mov rbx, [rbp-8]
; After
mov [rbp-8], rax
mov rbx, rax
Mica has both an adjacent version of this rewrite and a bounded-window version that can look past a few non-interfering instructions.
Arithmetic Temp Folding
x86-64 can often encode a memory operand directly inside an arithmetic instruction, so a temporary register load becomes unnecessary:
; Before
mov rax, [tempA]
mov rcx, [tempB]
add rax, rcx
mov [tempA], rax
; After
mov rax, [tempA]
add rax, [tempB]
mov [tempA], rax
The same idea also applies to boolean operations and compare/test sequences.
Call Return Forwarding
Call results frequently arrive in the right place already. Storing them to a stack temporary just to read them back is needless traffic:
; Before
call foo
mov [temp], rax
mov rcx, [temp]
; After
call foo
mov rcx, rax
This pass is ABI-aware. It knows which registers carry integer and floating return values and which registers a call may clobber.
Dead Store Elimination
Some stores disappear simply because a later store overwrites the same location before any read can observe the first value:
; Before
mov [rbp-8], r10
mov ecx, [rbp-16]
mov [rbp-8], r11
; After
mov ecx, [rbp-16]
mov [rbp-8], r11
Stack Cleanup
Late cleanup is also a good place to simplify frame noise:
; Before
sub rsp, 32
add rsp, 16
; After
sub rsp, 16
Or, when the two instructions cancel exactly, both can disappear.
Why These Rewrites Are Safe
The optimizer’s main rule is simple: when proof becomes uncertain, it does not rewrite. Several constraints enforce that policy.
- Exact memory matching. Two memory operands are treated as the same location only when base register, symbol, offset, and size all match.
- Register-family awareness.
rax,eax,ax,al, andahare not independent; a write to one view can invalidate assumptions about the others. - ABI boundaries. Calls are treated according to the active ABI, including caller-saved and callee-saved register rules.
- Control-flow barriers. Labels, jumps, returns, and calls terminate many local searches because they break the simple linear model the peephole pass relies on.
- Bounded windows. Non-adjacent rewrites search only a small number of instructions, which keeps reasoning local and predictable.
- Flag preservation. Rewrites that would change observable flag behavior are rejected unless the optimizer can show the replacement is equivalent for the later consumer.
This is why the optimizer is best described as conservative rather than aggressive. It intentionally leaves some opportunities untouched.
What The Optimizer Measures
The pass reports both the total instruction count before and after cleanup and per-pattern counters for the rewrites it applies. That makes the backend measurable instead of anecdotal. It is possible to tell whether a new pass is actually carrying its weight, and it is possible to detect when a change in earlier compilation stages starts producing worse assembly.
The statistics are also useful for development discipline. A peephole pass is easy to add; a justified peephole pass needs evidence.
What This Optimizer Is Not
The current peephole pass is not Mica’s end-state optimizer. It does not attempt:
- whole-function SSA optimization
- global value numbering
- loop transforms
- interprocedural analysis
- register allocation strategy changes
Those are later stages on Mica’s roadmap. The peephole optimizer is the local cleanup layer that already ships today, and it provides a solid, testable base for the more ambitious optimizer work planned next.
Where It Goes Next
Mica’s roadmap after 4.5 includes SSA transformation and broader global optimization work. When that arrives, the peephole pass does not disappear. It continues to matter because backend lowering still creates machine-specific local artifacts that higher-level analysis does not see.
That is the lasting role of a peephole optimizer in this compiler: not a substitute for global optimization, but a disciplined last pass that turns uniformly emitted assembly into cleaner final machine code.