The Mica backend does not try to make the emitter clever at every single point of code generation. It takes a different route: emit correct assembly with uniform rules first, then clean up the local waste in a dedicated pass.

That final cleanup stage is the peephole optimizer. It runs after x86-64 assembly has been emitted and before the final text is handed off to the assembler. The optimizer is deliberately conservative. It works on short instruction windows, applies only rewrites that are locally defensible, and stops as soon as control flow or aliasing uncertainty makes a transformation questionable.

If you want the full compiler context around this stage, start with The Mica Compiler — A Technical Portrait. This article zooms in on the optimizer itself.

Why Mica Uses a Peephole Pass

Mica’s code generator is designed around clarity and predictable lowering. That keeps the emitter manageable, but it also creates recurring local patterns:

  • values staged through stack temporaries even when a direct form would do
  • loads that immediately reuse a value that is already available in a register
  • call setup code that stores to a temp only to reload the same value into an argument register
  • stack adjustments and save/restore pairs that become unnecessary after earlier rewrites

Trying to bake every one of those cases directly into the emitter would make the backend harder to reason about. The peephole pass keeps the division of labor clean:

  • the emitter focuses on correctness and uniform lowering
  • the optimizer removes the obvious local waste that falls out of that strategy
  • later global optimization work can be built on top without entangling the emitter in special cases

This is why the pass is intentionally local. It is not trying to perform full SSA-based optimization, loop analysis, or interprocedural reasoning. It is there to make already-correct machine code smaller, cleaner, and less noisy.

How The Current Optimizer Is Structured

In Mica 4.5, the optimizer applies 17 passes and repeats them until an entire iteration produces no further changes. That fixed-point style matters because one local rewrite often creates another:

  • forwarding can create redundant mov instructions
  • propagation can make earlier temp stores dead
  • stack or compare cleanup can expose new adjacent patterns

The passes are arranged in three groups.

Group 1: Temp Materialization And Operand Folding

This first group attacks the “store to temp, load from temp, use value” patterns that appear naturally in a uniform emitter.

PassPurpose
Literal propagationCollapse immediate-through-temp chains
Memory copy propagationRedirect later loads back to the original source
String descriptor forwardingBypass temporary reloads for 16-byte string descriptors
Arithmetic temp foldingFeed arithmetic directly from memory, immediates, or constant-pool entries
Boolean temp foldingFold byte-sized boolean temporaries into their consumers

Group 2: Redundancy Cleanup And Call-Adjacent Rewrites

This is the largest group. It removes round-trips that are semantically harmless but mechanically expensive.

PassPurpose
Redundant register move eliminationRemove mov Rx, Rx style no-ops
Call argument forwardingSkip temp staging when a value only feeds an ABI argument register
Call return forwardingReuse return registers instead of storing to a temp and reloading
Adjacent store-load forwardingForward a value across a strictly adjacent store/load pair
Windowed store-load forwardingDo the same across a short bounded gap
Redundant load eliminationReuse an earlier register value instead of reloading memory
Load-store forwardingRemove pointless write-backs of values that were just loaded
Dead store eliminationRemove stores that are overwritten before any read

Group 3: Compare And Stack Cleanup

The last group handles the small structural artifacts that remain after the value-flow rewrites have simplified the stream.

PassPurpose
Compare temp foldingFold temp loads directly into cmp or test consumers
Compare cleanupRemove back-to-back redundant flag-setting instructions
Stack adjust foldingCombine adjacent add/sub rsp updates
Push/pop eliminationRemove adjacent push/pop pairs with no net effect

Representative Rewrites

The easiest way to understand the optimizer is to look at the kind of assembly it shortens.

Literal Propagation

Uniform lowering often stages a literal through a temporary location before writing it to the real destination:

; Before
mov [temp], 42
mov rax, [temp]
mov [dest], rax

; After
mov [dest], 42

The point is not that the emitter is “wrong”. The point is that the peephole pass can remove the detour once the detour is visible in concrete assembly.

Store-Load Forwarding

If a value is stored and immediately reloaded from the exact same address, the second instruction does not need memory at all:

; Before
mov [rbp-8], rax
mov rbx, [rbp-8]

; After
mov [rbp-8], rax
mov rbx, rax

Mica has both an adjacent version of this rewrite and a bounded-window version that can look past a few non-interfering instructions.

Arithmetic Temp Folding

x86-64 can often encode a memory operand directly inside an arithmetic instruction, so a temporary register load becomes unnecessary:

; Before
mov rax, [tempA]
mov rcx, [tempB]
add rax, rcx
mov [tempA], rax

; After
mov rax, [tempA]
add rax, [tempB]
mov [tempA], rax

The same idea also applies to boolean operations and compare/test sequences.

Call Return Forwarding

Call results frequently arrive in the right place already. Storing them to a stack temporary just to read them back is needless traffic:

; Before
call foo
mov [temp], rax
mov rcx, [temp]

; After
call foo
mov rcx, rax

This pass is ABI-aware. It knows which registers carry integer and floating return values and which registers a call may clobber.

Dead Store Elimination

Some stores disappear simply because a later store overwrites the same location before any read can observe the first value:

; Before
mov [rbp-8], r10
mov ecx, [rbp-16]
mov [rbp-8], r11

; After
mov ecx, [rbp-16]
mov [rbp-8], r11

Stack Cleanup

Late cleanup is also a good place to simplify frame noise:

; Before
sub rsp, 32
add rsp, 16

; After
sub rsp, 16

Or, when the two instructions cancel exactly, both can disappear.

Why These Rewrites Are Safe

The optimizer’s main rule is simple: when proof becomes uncertain, it does not rewrite. Several constraints enforce that policy.

  • Exact memory matching. Two memory operands are treated as the same location only when base register, symbol, offset, and size all match.
  • Register-family awareness. rax, eax, ax, al, and ah are not independent; a write to one view can invalidate assumptions about the others.
  • ABI boundaries. Calls are treated according to the active ABI, including caller-saved and callee-saved register rules.
  • Control-flow barriers. Labels, jumps, returns, and calls terminate many local searches because they break the simple linear model the peephole pass relies on.
  • Bounded windows. Non-adjacent rewrites search only a small number of instructions, which keeps reasoning local and predictable.
  • Flag preservation. Rewrites that would change observable flag behavior are rejected unless the optimizer can show the replacement is equivalent for the later consumer.

This is why the optimizer is best described as conservative rather than aggressive. It intentionally leaves some opportunities untouched.

What The Optimizer Measures

The pass reports both the total instruction count before and after cleanup and per-pattern counters for the rewrites it applies. That makes the backend measurable instead of anecdotal. It is possible to tell whether a new pass is actually carrying its weight, and it is possible to detect when a change in earlier compilation stages starts producing worse assembly.

The statistics are also useful for development discipline. A peephole pass is easy to add; a justified peephole pass needs evidence.

What This Optimizer Is Not

The current peephole pass is not Mica’s end-state optimizer. It does not attempt:

  • whole-function SSA optimization
  • global value numbering
  • loop transforms
  • interprocedural analysis
  • register allocation strategy changes

Those are later stages on Mica’s roadmap. The peephole optimizer is the local cleanup layer that already ships today, and it provides a solid, testable base for the more ambitious optimizer work planned next.

Where It Goes Next

Mica’s roadmap after 4.5 includes SSA transformation and broader global optimization work. When that arrives, the peephole pass does not disappear. It continues to matter because backend lowering still creates machine-specific local artifacts that higher-level analysis does not see.

That is the lasting role of a peephole optimizer in this compiler: not a substitute for global optimization, but a disciplined last pass that turns uniformly emitted assembly into cleaner final machine code.