Version 4.5 · Released March 2026
Structured clarity. C control. Born for AI.
A note on accuracy
Mica is developed by a single engineer working with AI assistance. This portrait represents the best current description of the language and compiler — but it is a living document. Specific names, API spellings, library namespaces, and feature boundaries will change as implementation reveals what actually works. Inaccuracies and inconsistencies can and do occur.
What will not change is the direction and the vision. The platform philosophy, the commitment to explicit semantics, the structured-language foundation, and the AI trajectory are stable. Everything else is subject to revision as the compiler grows.
Mica 4.5 is the first public release. The compiler is verified by a test harness spanning 576 Mica programs, but as with any first release, unexpected limitations may still be encountered. The compiler, the VS Code extension, and the tutorials are under intensive development throughout 2026 — changes can and will occur. The primary goal for 2026 is a consistent, well-designed Mica core language with a complete standard library. Known gaps — including heap memory allocation, a string library, and further standard library coverage — will be addressed in the course of this work.
◈ What Is Mica?
Mica is a systems programming language designed for readability, explicit control, and a long-range trajectory toward AI-native compiler semantics. It brings clean, structured syntax together with the low-level control that systems programmers need — without hiding the machine behind abstractions you didn’t ask for.
The Mica compiler is written in pure Go with zero external dependencies. It compiles Mica source code all the way down to native Linux x86_64 ELF binaries with DWARF v5 debug information, using only the GNU assembler and linker as final tools. There is no LLVM. There is no GCC backend. Every phase of compilation — from lexical analysis through register-level peephole optimization — is implemented from scratch inside this single repository.
After two and a half years and 1,844 commits of quiet, disciplined development, Mica 4.5 is the first public release: a real compiler that produces real binaries you can run, inspect in GDB, and call from C.
◈ At a Glance
| Metric | Value |
|---|---|
| Implementation language | Go 1.26, pure standard library |
| Compiler Go source files | 322 |
| Compiler gross lines (Go, incl. comments & whitespace) | 75,545 |
| Test harness Go implementation files | 27 |
| Test harness gross lines (Go) | 5,404 |
| Mica test programs (.mica files) | 576 |
| Mica test program gross lines | 151,958 |
Standard library (C23, mica-stdlib.c) | 778 lines |
| Assembly runtime routines | 4 |
| Assembly runtime instructions | ~31 |
| Total project gross lines | ≈ 234,000 |
| Packages | 25 |
| Test cases | 538 (execution, errors, IL, assembly) |
| Execution tests | 266 |
| Error tests | 200 |
| IL tests | 49 |
| ASM tests | 23 |
| Total commits | 1,844 |
| First commit | December 27, 2023 |
| Current version | 4.5.0 |
| Target platform | Linux x86_64, System V AMD64 ABI |
| Debug format | DWARF v5 |
| Binary format | ELF |
| External toolchain | GNU assembler (as) + GNU linker (ld) |
| License | MCL-1.0 (non-commercial; commercial licensing available) |
Stress test timings (StressLOC series):
| Test | Time |
|---|---|
| StressLOC100k (100,000-line program) | 6.687 s |
| StressLOC10k | 718 ms |
| StressLOC1k | 142 ms |
Assembly runtime — Mica emits a small set of hand-written x86_64 routines
directly into each compiled binary. The current four routines (~31 instructions
total) handle static link traversal for nested procedures, byte-level string
comparison via repe cmpsb, and the non-returning runtime failure path. They
are deliberately minimal: each routine does exactly one well-defined job,
carries precise register documentation, and stays out of the way of the C
standard library. The set will grow as the language grows — heap allocation,
bounds checking, and concurrency primitives are natural additions — but the
design principle remains the same: keep the runtime small, keep it correct,
and let the platform handle everything it already does well.
◈ Why Mica Exists
There is a space between languages that is surprisingly empty.
Some languages are easy to read but stay far from the machine. Others give you raw control but demand constant vigilance against misuse. Few bridge both worlds cleanly — and fewer still have a credible direction for what programming should look like in an AI-shaped future.
Mica occupies that space deliberately:
| Dimension | What Mica delivers | Why it matters |
|---|---|---|
| Readability | Clean block structure, nested functions, records, arrays, sets, ordinal types, structured control flow | Code that communicates intent, teachable from day one, analyzable without context |
| Control | Explicit pointers, explicit address/dereference, direct ABI compatibility, contract-based interop | Systems behavior is predictable; FFI cost is visible, never hidden |
| Direction | Compiler-driven optimization, mathematical notation, tensor-native semantics (planned) | A language shaped toward AI and numerical computing from the start, not retrofitted after the fact |
Mica is not trying to become an object-oriented language with classes and inheritance. Mica is not building a package-manager ecosystem. Abstraction in Mica comes from records, procedures, nested functions, and contract-defined interfaces — proven structures that stay readable at any scale.
◈ The Platform Vision
Mica does not build a private ecosystem. It opens the one that already exists.
The world already has decades of proven C libraries, POSIX interfaces, Linux kernel APIs, and battle-tested system runtimes. They are not going anywhere. Every HTTP client, every database driver, every numerical toolkit, every operating system interface — already written, already tested, already running on billions of machines. Mica’s goal is not to replace any of that. It is to make all of it immediately and safely accessible from structured, readable source code.
Most new languages ask developers to wait while an ecosystem is built from scratch. Mica takes the opposite position: the ecosystem already exists. It is the entire C ABI world — and Mica is designed to reach every part of it directly.
The JSON contract system is the mechanism. Every C library surface, every POSIX API, every Linux syscall that follows a stable ABI can be described in a contract file and reached from Mica source with full type checking, format-string validation, and compile-time safety. A Mica binary and a C library binary are the same kind of object. They link together directly under the System V AMD64 ABI. There is no adapter layer. There is no runtime bridge. There is no reimplementation required.
This principle shapes every architectural decision in Mica:
- The standard library grows by surfacing what already exists, not by reimplementing it in a closed language-specific world.
WriteLnandReadLnare not new I/O systems — they resolve directly towprintfandwscanf(orprintf/scanfon UTF-8 platforms). The compiler selects the right platform function; the developer writes one clean call.- The planned
posixandlinuxcontract packs will give Mica programs access to the full Linux OS API surface without leaving the language. - C interop is first-class, not an escape hatch. A contract file is how you reach any C library — whether it ships with Mica, was authored by a third party, or was written ten years before Mica existed.
- Mica programs and C programs share one binary world. They call each other, link together, and cooperate under the same ABI with no performance tax at the boundary.
Two languages sharing a platform are not competitors. A Mica binary and a C binary can be linked into the same executable. A Mica library can be called from C. A C library can be called from Mica. The platform is the ecosystem — and Mica belongs to it without trying to displace it.
The measure of success is not how many libraries Mica ships. It is how quickly a developer can reach any library that already exists — including libraries written years from now — with the same structured, safe, readable source model that defines the rest of the language.
Mica augments the platform. It does not compete with it.
◈ A First Look at Mica
The best way to introduce a language is to show it. Every snippet below is real, compiled code from the 4.5 test suite.
The language reads like structured intent
type
Color = (Red, Green, Blue);
var
c : Color;
begin
c := Green;
case c of
Red: WriteLn("red");
Green: WriteLn("green");
Blue: WriteLn("blue");
end;
end.
No integer casts. No fall-through hazards. No magic numbers. The source says
exactly what the program means — and the compiler enforces the domain. A
Color variable cannot accidentally hold an integer, and an integer cannot
silently become a Color.
Enumerations carry their domain into arrays
type
Month = (Jan, Feb, Mar, Apr, May, Jun,
Jul, Aug, Sep, Oct, Nov, Dec);
Calendar = array[Month] of int32;
var
days : Calendar;
begin
days[Jan] := 31;
days[Feb] := 28;
days[Mar] := 31;
end.
The array is indexed by Month values, not integers. The compiler enforces
domain correctness at every index site. There is no invisible month - 1
translation, no risk of an off-by-one born from a zero-based convention, and
no way to index a calendar with an unrelated integer.
Every pointer operation is spelled out
procedure Scale(p : pointer Point; factor : float64);
begin
p.x := p.x * factor;
p.y := p.y * factor;
end;
pointer Point in the signature is an unambiguous statement: Scale modifies
the caller’s record. Not T*, not an implicit reference — a readable
description of intent, visible at every call site. Adding a field to Point
does not silently change the calling convention. Passing a Point by value
requires writing p : Point, nothing more.
Notice that p.x works directly — no explicit dereference required. When the
base of a field selection chain is a pointer to a record, the compiler
lowers the dereference automatically during semantic analysis. The explicit
form value p.x is also valid and produces identical machine code. More on
this in the semantic analysis chapter.
Imports are platform calls, not a private ecosystem
imp
WriteLn : std;
Sin : math;
WriteLn resolves to wprintf. Sin is libm. The names are clean and
readable, but nothing is invented underneath — the compiled binary calls the
exact same functions every C program on the system uses. There is no separate
I/O runtime, no reimplemented math library, no translation layer. The import
statement names a library and a contract; the platform provides the rest.
◈ What Makes This Compiler Different
Zero external dependencies. The Mica compiler is written in pure Go with
no external libraries. go build ./... is all you need. The standard library
links against as and ld and nothing else. This is an intentional
discipline: the compiler should be understandable, buildable, and modifiable
by any engineer with a Go toolchain — on any machine, without first assembling
a dependency graph.
Every stage is inspectable. Every intermediate representation can be exported and examined: the token stream from the scanner, the AST from the parser, human-readable Spectra IL from the generator, x86_64 assembly in Intel or AT&T syntax, and the final ELF binary with DWARF v5 debug information. This makes Mica not just a compiler but a teaching instrument. You can watch a program transform step by step from source to machine code and read exactly what happened at each stage.
Harness-first development. The 538 test cases are not regression tests — they are the specification of what the compiler can do. No feature lands without harness coverage across execution, error, IL, and assembly tests, with variants for different compiler configurations. This discipline has kept the compiler honest across 1,844 commits and two and a half years of development.
ABI truthfulness. The type system knows exactly how every type is classified under the System V AMD64 ABI — INTEGER, SSE, or MEMORY. The function call emitter uses that classification directly. The DWARF debug information describes what was actually generated. When you step through a Mica program in GDB, the stack frames look exactly like what the ABI specification says they should — because they are built that way from the ground up, not adjusted afterward.
◈ What Mica Looks Like
All examples below are real, compiled programs from the 4.5 test suite.
Hello, Mica
program ConstExpression;
imp
WriteLn : std;
const
Sum = 5 + 7 * 2;
Difference = (20 - 3) * 2;
Product = (6 as int64) * 7;
Ratio = 9.0 / 2.0;
Check = (Sum = 19) and (Difference > 30);
var
i32 : int32;
i64 : int64;
f64 : float64;
flag : bool;
begin
i32 := Sum;
WriteLn(" Sum = %d", i32);
i64 := Product;
WriteLn(" Product = %lld", i64);
f64 := Ratio;
WriteLn(" Ratio = %.2lf", f64);
flag := Check;
WriteLn(" Check = %hhu", flag);
end.
This is a complete, compilable Mica program. The structure is clean and
readable — program, const, var, begin…end. — alongside C-level
type specificity (int32, int64, float64) and familiar format-string I/O.
The standard library function name WriteLn follows Mica’s PascalCase
naming convention for callable library symbols.
Constant expressions like 5 + 7 * 2 are evaluated entirely at compile time.
The variable Sum never generates a runtime computation.
Recursive Functions
function Factorial(n : int32) : int64;
begin
if n <= 1 then
Factorial := 1
else
Factorial := (n as int64) * Factorial(n - 1);
end;
Functions return values by assigning to their own name. The as keyword
performs explicit type conversion. This is real
recursive code generating real call frames, traced correctly by DWARF and GDB.
Nested Functions with Lexical Scoping
function Outer(base : int32) : int32;
var
offset : int32;
function Inner(input : int32) : int32;
begin
Inner := input + offset;
end;
begin
offset := 10;
Outer := Inner(base);
end;
Inner reads offset from its enclosing scope. The compiler implements this
through a static link chain in activation records — a real frame-pointer
chain that GDB can follow and that the test suite verifies at every nesting
level.
The for Loop
program ForIntegralLoops;
imp
WriteLn : std;
var
i : int32;
sum : int32;
begin
sum := 0;
for i := 0 to 3 do
sum := sum + i;
WriteLn("ascending sum = %d", sum);
for i := 3 downto 0 do
WriteLn("i = %d", i);
end.
The for loop supports to (ascending) and downto (descending) directions.
Loop bounds are evaluated exactly once before the first iteration — verified
by a dedicated test ForBoundsEvaluatedOnce that counts function call
invocations. The control variable is read-only inside the loop body: direct
assignment, address-of, and outer-scope access are all compile-time errors.
The loop also handles terminal-bound safety correctly. When the control variable
reaches the final bound and would need to overflow to continue, Mica exits
before the step — making for i := MaxInt32 - 1 to MaxInt32 safe with exactly
two iterations.
{ For loops work with enum control variables too }
type
Color = (Red, Green, Blue);
var
c : Color;
begin
for c := Red to Blue do
begin
if c = Red then WriteLn("color = Red");
if c = Green then WriteLn("color = Green");
if c = Blue then WriteLn("color = Blue");
end;
end.
Subrange and enum types are fully supported as control variable types. The loop domain is checked at compile time: boolean and set types are rejected with clear error messages.
The repeat ... until Loop
program RepeatSimple;
imp
WriteLn : std;
var
i : int32;
begin
i := 1;
repeat
WriteLn("i = %d", i);
i := i + 1;
until i > 3;
WriteLn("done, i = %d", i);
end.
repeat ... until is the post-test loop: the body always executes at least
once, and execution continues until the condition becomes True. This is the
natural companion to while — and its semantics are verified by a dedicated
RepeatExecutesOnce test that confirms guaranteed first-iteration execution
even when the condition is already true at entry.
{ Leaving early from a repeat loop }
repeat
if i = 2 then
leave;
total := total + i;
i := i + 1;
until i > 3;
leave exits the enclosing procedure, not just the loop. The test suite
covers leave inside while, for, and repeat bodies.
The case Statement
program CaseEnumMatch;
imp
WriteLn : std;
type
Color = (Red, Green, Blue);
var
color : Color;
begin
color := Green;
case color of
Red: WriteLn("red");
Green: WriteLn("green");
Blue: WriteLn("blue");
end;
end.
The case statement dispatches on any ordinal selector — integers, subranges,
enumerations. Each arm is a constant label followed by a statement or
begin...end block. An optional else branch handles unmatched values.
{ case with integer selector and else branch }
case selector of
1: WriteLn("one");
2: WriteLn("two")
else
WriteLn("other")
end;
The 4.5 implementation covers single constant labels with optional else. The
compiler enforces that labels must be ordinal, must be compile-time constants,
must be compatible with the selector type, and must not repeat. The
no-semicolon-before-else rule is enforced consistently across both if and
case — tested by dedicated error cases in the harness.
Process Arguments
program ProcessArguments;
imp
* : std;
* : process;
begin
if ProgramName() # Empty then
WriteLn("program=present")
else
WriteLn("program=empty");
WriteLn("count=%d", ArgCount());
if Arg(1) = "alpha" then
WriteLn("arg1=match");
if Arg(2) = "Grüße" then
WriteLn("arg2=match");
end.
The process library exposes program arguments through three clean functions:
ProgramName(), ArgCount(), and Arg(n). The program declaration stays
parameterless — no C-style argc/argv in the entry point — and all argument
access goes through ordinary library calls. This is tested in both UTF-32 and
UTF-8 execution modes, including Unicode argument values.
Records, Pointers, and Explicit Memory
type
Point = record
x : int32;
y : int32;
end;
var
point : Point;
ptr : pointer Point;
begin
point.x := 4;
point.y := 9;
ptr := address point;
{ Explicit form — always valid }
value ptr.x := value ptr.x + point.y;
{ Auto-deref form — also valid for pointer-to-record field selection }
ptr.x := ptr.x + point.y;
WriteLn("x = %d", point.x);
WriteLn("y = %d", point.y);
end.
This is where Mica’s memory model comes into focus:
pointer Point— a pointer type, declared in plain wordsaddress point— takes the address of a variable (no&operator)value ptr— dereferences a pointer to read or write the full pointed-to value (no*operator)value ptr.x— explicit dereference-then-select, reading naturally left to right
Pragmatic auto-deref for pointer-to-record field selection. Mica is
explicit about memory by principle — but not dogmatic about it. For
pointer-to-record field selection chains specifically, writing value is
optional. When the semantic analyzer encounters ptr.x and ptr is a
pointer Point, it recognizes the pattern and the lowering pass
automatically inserts a dereference node before code generation. The
rewrite p (*struct) → dereference(p) produces identical machine code to
the explicit value ptr.x form — there is no runtime difference, no extra
indirection, no overhead of any kind.
This is a deliberate pragmatic choice. Requiring value on every field
access through a pointer adds ceremony without adding clarity — the field
selector already names the target unambiguously. The value keyword carries
its weight where it matters: full pointer dereference to read or write the
complete pointed-to value, passing a dereferenced value as an argument, or
any case where the dereference is not immediately followed by a named field.
The rule is precise and local: auto-deref applies only to pointer-to-record identifier bases in selection chains. Everything else remains explicit. Arrays do not decay to pointers. Aggregate types carry no implicit reference semantics. The auto-deref is typed, deterministic, and lowered during semantic analysis — not a runtime behavior.
Sets, Subranges, and Membership
type
N = 0..3;
NSet = set of N;
function Build(lo : N, hi : N) : NSet;
begin
Build := [lo..hi];
end;
function Contains(v : N, s : NSet) : bool;
begin
Contains := v in s;
end;
Sets are first-class values. They can be passed to and returned from functions,
constructed with range expressions at runtime, and queried with in. The
set constructor [lo..hi] generates real machine code for dynamic bounds.
Short-Circuit Evaluation
{ The right side is NEVER evaluated when left side is False for and,
or True for or. The compiler generates conditional jumps, not eager calls. }
result := False and IncrementTrue();
{ counter stays 0 — right side was skipped }
result := True or IncrementFalse();
{ counter stays 0 — right side was skipped }
Checked Arithmetic
program CheckedArithmeticPolicyDebugRelease;
imp
WriteLn : std;
var
a, b, c : int64;
begin
a := MaxInt64;
b := 1;
c := a + b;
WriteLn("signed_add_result=%lld", c);
end.
This single program produces three different behaviors depending on the compiler’s optimization policy:
--optimize debugorrelease: hardware wrap semantics, overflow is silent--optimize checked: compiler-inserted overflow detection terminates the program with a diagnostic including source file and line number
The harness verifies all three behaviors automatically via test variants — same source, different flags, independently validated expected output.
All Numeric Types
Mica provides the full set of fixed-width types that systems programmers expect:
| Category | Types |
|---|---|
| Signed integers | int8, int16, int32, int64 |
| Unsigned integers | uint8, uint16, uint32, uint64 |
| Floating-point | float32, float64 |
| Boolean | bool |
| Character | unicode (32-bit code point) |
| String | string (UTF-32, descriptor-based) |
Every type has a precise size, a precise ABI classification, and predictable behavior at every optimization level.
◈ The Ordinal Universe
One of the most powerful ideas in Mica’s type system is the concept of ordinal types and the rich type algebra that flows from them.
In most languages, “number” and “sequence” are the same thing. Arrays start at zero. Loops count upward. Enumerations are secretly integers. Sets are bit masks. The programmer’s domain model — months, colors, grades, directions — is expressed in raw integers and the meaning is carried only in the programmer’s head.
Mica offers a different way of thinking.
What Is an Ordinal?
An ordinal type is any type whose values form a finite, ordered, discrete sequence with a well-defined first element, last element, and successor function. Every value in an ordinal type can be counted, compared, and used as an index.
Mica’s ordinal types are:
| Type | Domain | Example Values |
|---|---|---|
bool | {False, True} | 2-element binary ordinal |
int8…int64 | Integer intervals | -128..127, 0..2147483647 |
uint8…uint64 | Unsigned intervals | 0..255, 0..18446744073709551615 |
unicode | Unicode code points | 'a', 'π', '漢' |
| Enumeration | Named constants | Red, Green, Blue |
| Subrange | Restricted interval | 1..31, 0..9 |
Ordinals — Domain Knowledge Carried by the Type System
This is not just a programming convenience. Mica’s type system carries ordinal domain knowledge natively — ordinals know their bounds, and the compiler uses that knowledge structurally throughout the language.
Every ordinal type — integers, booleans, unicode, enumerations, and subranges
— has computable domain bounds: the compiler knows its exact lower and
upper values. Dedicated type system functions resolve the [lower, upper]
interval of any ordinal type, determine whether a constant value lies within
that domain, find the common ordinal type for binary operations, and establish
the underlying arithmetic base type for ordinal calculations. Whether one
ordinal value can be assigned to another ordinal type is governed by explicit
domain-aware rules, not silent integer coercions.
That domain knowledge drives a surprising amount of the language:
- Array sizes are computed from ordinal index domains, not raw constants
- Set cardinalities are computed from the ordinal domain of the element type
forloop bounds are validated against the ordinal domain of the control variablecaselabel validation checks that each label is within the ordinal domain of the selector- Subrange assignment is checked against the declared interval
The type system doesn’t just permit ordinals — it knows their domains and uses that knowledge everywhere those types appear. The rest of this section shows what that enables.
Enumerations — Naming the Domain
type
Color = (Red, Green, Blue, Yellow);
Direction = (North, East, South, West);
Month = (Jan, Feb, Mar, Apr, May, Jun,
Jul, Aug, Sep, Oct, Nov, Dec);
An enumeration is not a shorthand for integer constants. It is a new
type with its own named domain. Red is not 0. It is Red. The
compiler knows the type is Color, knows the domain has four elements, and
can reject nonsense like assigning a Direction to a Color variable.
The ordinal ordering of enum values follows their declaration order.
Red < Green < Blue < Yellow holds. You can use enum values as array
indices, set elements, and for-loop bounds — all of which become
semantically meaningful rather than numerically arbitrary.
Subranges — Restricting the Domain
type
Day = 1..31;
Month = 1..12;
Digit = 0..9;
Grade = 1..5;
Slot = 1..3;
A subrange declares that a variable holds values only within a specified
interval. Day is not int32. It is a type whose domain is exactly
{1, 2, ..., 31}. The compiler knows the bounds. Arrays indexed by
subranges have a known and exact size. Sets over subranges have a known
and exact cardinality.
This changes what you can express in a type signature. A function that
accepts a Day is telling the truth about what values it accepts. A
function that accepts int32 and happens to only use values 1..31 is
carrying its contract only in comments.
Arrays with Ordinal Index Types — and Lower Bounds That Mean Something
This is where the ordinal model becomes concretely powerful.
In most languages, arrays start at zero. Always. That is a hardware convention that leaked into language design: memory addresses start at zero, so array indices start at zero. The problem is that most real-world domains do not start at zero. Months start at 1. Floors in a building are numbered. A temperature sensor covers −20 to 60. A population table runs from 1900 to 2100. A convolution kernel is centered at zero and runs from −2 to 2.
Every time the domain doesn’t start at zero, the programmer carries an
invisible translation burden: daysInMonth[month - 1], population[year - 1900],
kernel[offset + 2]. That subtraction is not part of the problem. It is an
artifact of the storage model leaking into the logic. Forget it once and
you read the wrong data — silently.
Mica arrays have a lower bound that belongs to the type, not to the programmer’s memory. The index type is an ordinal type, and the array spans exactly the domain of that type:
type
Month = (Jan, Feb, Mar, Apr, May, Jun,
Jul, Aug, Sep, Oct, Nov, Dec);
DaysInMonth = array[Jan..Dec] of int32;
{ Years 1900 to 2100 — the lower bound is part of the type }
PopulationTable = array[1900..2100] of int64;
{ A convolution kernel centered at zero — negative lower bound }
Kernel = array[-2..2] of float64;
{ Workdays 1..5, not 0..4 }
WeekdayHours = array[1..5] of int32;
DaysInMonth[Jan] is the number of days in January. Not daysInMonth[0].
Not daysInMonth[MONTH_JAN - 1]. The index is the month, because the
array’s domain is the month domain.
population[1970] is the population in 1970. Not population[1970 - 1900].
kernel[-1] is the kernel value at offset −1. Not kernel[-1 + 2].
The compiler knows both bounds. It computes the array’s size from
upper - lower + 1 — no programmer arithmetic required, no opportunity to
get it wrong. Constant index accesses outside the declared range are a
compile-time error.
var
pop : PopulationTable;
k : Kernel;
begin
pop[1900] := 1_600_000_000;
pop[2000] := 6_100_000_000;
k[-2] := 0.0625;
k[-1] := 0.25;
k[0] := 0.375;
k[1] := 0.25;
k[2] := 0.0625;
end.
This code reads exactly like the problem it models. A reader who knows the domain immediately understands the code. A reader who doesn’t know the domain can read the type declaration and immediately understands the valid range. There is nothing to decode, no constant to subtract, no comment needed to explain why the index starts where it does.
With enum-typed indices the benefit compounds further:
type
Color = (Red, Green, Blue, Yellow);
ColorWeights = array[Red..Yellow] of float64;
var
weights : ColorWeights;
begin
weights[Red] := 0.2126;
weights[Green] := 0.7152;
weights[Blue] := 0.0722;
weights[Yellow] := 0.0;
end.
There is no index 4 that silently writes past the end. There is no integer 0
standing in for Red. The array has exactly as many elements as the domain
has values — computed by the compiler from the type, not declared by the
programmer as a separate constant that can drift out of sync.
Multi-dimensional arrays compose naturally, mixing index types freely:
type
Grid = array[1..3, Red..Blue] of int32;
var
g : Grid;
begin
g[1, Red] := 10;
g[2, Green] := 20;
g[3, Blue] := 30;
end.
Sets — Membership, Not Bit Manipulation
type
Color = (Red, Green, Blue, Yellow);
ColorSet = set of Color;
var
palette : ColorSet;
warm : ColorSet;
combined : ColorSet;
begin
palette := [Red, Green, Blue];
warm := [Red, Yellow];
if Red in palette then
WriteLn("Red is in the palette");
if not (Yellow in palette) then
WriteLn("Yellow is not in the palette");
end.
Sets in Mica are mathematical sets over an ordinal domain. They are
not bit flags. They are not integers with & and |. They are sets.
Membership is tested with in. Construction uses set literal syntax
[...] with optional range expressions [lo..hi].
Compare this to typical C:
/* C: the programmer manually maintains the bit-shift contract */
#define RED (1 << 0)
#define GREEN (1 << 1)
#define BLUE (1 << 2)
#define YELLOW (1 << 3)
int palette = RED | GREEN | BLUE;
if (palette & RED) { ... } /* is this testing bit 0? who knows */
In Mica, the set is typed. The domain is known at compile time. The element
type is Color, not int. Red in palette reads like the question it is
asking.
Sets also combine with ranges in constructors:
type
Digit = 0..9;
DigitSet = set of Digit;
var
evens : DigitSet;
odds : DigitSet;
begin
evens := [0, 2, 4, 6, 8];
odds := [1, 3, 5, 7, 9];
odds := [1..9]; { range constructor }
end.
For Loops Over Ordinal Domains
The for statement’s natural home is the ordinal type:
{ Enumerate all colors — no off-by-one possible }
for c := Red to Yellow do
begin
schedule[c] := ComputeValue(c);
end;
{ Iterate over a subrange domain }
for day := 1 to 31 do
WriteLn("Day %d", day);
{ Count down through an enum }
for d := Blue downto Red do
WriteLn("color index = %d", d as int32);
The loop domain is the domain of the ordinal type. For an enum, the loop
visits every named value in declaration order. For a subrange, it visits
every integer in the interval. The control variable is read-only inside
the loop body — the compiler enforces this, because a for loop means
iteration over a domain, not arbitrary mutation of a counter.
The Semantic Expressiveness This Enables
When ordinal types, arrays with ordinal indices, and sets work together, the code says what it means — not just how it computes:
type
WorkDay = (Mon, Tue, Wed, Thu, Fri);
WorkDaySet = set of WorkDay;
HourTable = array[Mon..Fri] of int32;
var
meetings : WorkDaySet;
workHours : HourTable;
total : int32;
begin
meetings := [Mon, Wed, Fri];
workHours[Mon] := 8;
workHours[Tue] := 9;
workHours[Wed] := 6;
workHours[Thu] := 9;
workHours[Fri] := 7;
total := 0;
for day := Mon to Fri do
begin
if day in meetings then
workHours[day] := workHours[day] - 1;
total := total + workHours[day];
end;
end.
No magic integers. No enum-to-int casts for array indexing. No bit-mask arithmetic for set membership. The code reads like the domain model it describes. A compiler — or a reader — can verify it locally without knowing any context outside the type declarations.
Nested Ordinal Types — Records Containing Sets Containing Ordinals
The real power emerges when these concepts compose:
type
Color = (Red, Green, Blue, Yellow);
Digit = 0..9;
Slot = 1..3;
DigitSet = set of Digit;
ColorSet = set of Color;
ScoreArray = array[1..3] of int32;
ColorArray = array[Red..Yellow] of int32;
Item = record
id : Digit;
flags : DigitSet;
scores : ScoreArray;
end;
Inventory = record
rows : array[1..3] of Item;
palette : ColorArray;
enabled : ColorSet;
end;
This data structure is fully verified by the compiler:
Item.idmust be in0..9— the type enforces itItem.flagsis a set over0..9— membership usesin, not bit shiftsItem.scoreshas exactly 3 elements, indexed 1 to 3 — not 0 to 2Inventory.paletteis indexed byColorvalues, not by integers 0–3Inventory.enabledis a set ofColor—Red in inv.enabledjust works
And all of this is passed by value, returned by value, nested at arbitrary depth — all with real machine code, DWARF annotations, and verified by the test suite.
Looking Forward — Ordinals in the AI Trajectory
The ordinal type system is infrastructure for Mica’s long-range direction.
When Mica eventually adds fixed-shape vector and matrix types, the
shape dimensions will naturally be expressed as ordinal types or subranges.
An array indexed by 0..3 and an array indexed by AxisX..AxisW are the
same machine layout — but the latter carries meaning that enables
compile-time shape checking:
{ Future Mica syntax — not implemented yet }
type
Axis = (X, Y, Z, W);
Vec4 = vector[Axis] of float32;
Mat4x4 = matrix[Axis, Axis] of float32;
The connection from today’s array[Red..Blue] of int32 to tomorrow’s
matrix[Axis, Axis] of float32 is direct. The type system already knows
how to represent, index, and validate ordinal-indexed aggregates. The AI
trajectory builds on the same foundation, not a different one.
Sets over ordinal types become natural precursors to predicate types
and domain restrictions in a type system that reasons about data shapes.
Subranges become natural representations for index bounds in tensor
operations. The for loop over an ordinal domain becomes the natural
surface for shape-driven iteration over tensor dimensions.
Mica is not retrofitting these ideas into a language that was built around raw integers. It is growing them out of a foundation that was designed for them.
◈ A Language for the Platform — Not Above It
The goal is not to replace the platform. The goal is to meet it.
Operating systems, native libraries, and hardware interfaces represent decades of accumulated engineering. Linux, glibc, POSIX, BLAS, OpenSSL, SDL, Vulkan — these exist, they are stable, they are fast, and they are documented. The problem with many modern language ecosystems is that they build a second world on top of this first one: a second memory manager, a second I/O runtime, a second type system for FFI, a second dependency graph, a second security model. The developer ends up living in the second world and treating the first as a distant, hostile country they occasionally have to visit.
Mica makes the opposite bet. Every C library on the system is a Mica library, described by a JSON contract and callable directly — no wrappers, no glue layers, no marshaling overhead. The system allocator is the allocator. The ELF loader is the loader. The OS scheduler is the scheduler. GDB is the debugger, because Mica binaries carry accurate DWARF v5 debug information.
WriteLn and ReadLn are not new I/O systems
This is the vision made concrete. When a Mica developer writes:
WriteLn("Balance = %d EUR", balance);
ReadLn("%d", n);
there is no Mica I/O runtime underneath. The compiler resolves WriteLn to
wprintf and ReadLn to wscanf — the same wide-character standard C library
functions that every C program on the system has always used. On a UTF-8
platform the contract resolver selects printf and scanf instead. The
developer writes one call. The compiler selects the right platform function
based on the target encoding. No new I/O ecosystem. No new runtime. No new
abstraction to learn, document, or maintain.
This is not a special case. It is the design rule. When you import * : math;,
you are importing the same libm that every C program on your system uses.
When you link a Mica library as a .so, it is a regular shared object that any
C program can dlopen. The binary is a first-class citizen of the platform
from the first line of compiled code.
Consequences that matter
- The entire ecosystem is already available. Every C library reachable from the system is reachable from Mica, through a contract file, with compile-time type safety and format-string validation — today, not after an ecosystem matures.
- Native tooling works without configuration. Profilers, debuggers, address sanitizers, linker scripts, and ELF inspection tools all work because the binary is a normal ELF binary with normal DWARF information.
- Mica binaries and C binaries are equal citizens. They link together, call each other, and share the same ABI with no performance tax at the boundary.
- There is no bootstrapping problem. There is no “Mica ecosystem” that must reach critical mass before useful programs can be written.
The contract system
A JSON contract describes a C library’s type surface, calling conventions, and ABI layout precisely enough that the Mica compiler can validate calls at compile time — including format strings, argument types, and cross-unit type fingerprints. The developer gets compile-time safety without runtime overhead and without writing a single line of binding code.
As the Mica standard library grows, every piece of it is designed to delegate
to the native platform for what the platform already does well. Mica’s UTF-32
string model sits on top of wchar_t. The math library is libm with a clean
contract. The planned posix and linux contract packs will be curated access
layers over real POSIX syscalls and Linux kernel APIs — not reimplementations.
Mica augments the platform. It does not compete with it.
◈ The Compilation Pipeline
Mica implements a textbook-clean compilation pipeline with explicit phase boundaries. Every intermediate representation can be inspected, exported, and reasoned about independently.
┌───────────────┐
│ Source Code │ UTF-8 encoded .mica files
└───────┬───────┘
▼
┌───────────────┐
│ Scanner │ Lexical analysis → Token stream
└───────┬───────┘
▼
┌───────────────┐
│ Parser │ Recursive descent → Abstract Syntax Tree
└───────┬───────┘
▼
┌───────────────┐
│ Analyzer │ 8 semantic passes → Enriched, validated AST
└───────┬───────┘
▼
┌───────────────┐
│ Generator │ AST traversal → Spectra IL (three-address code)
└───────┬───────┘
▼
┌───────────────┐
│ Emitter │ IL → x86_64 assembly instructions
└───────┬───────┘
▼
┌───────────────┐
│ Optimizer │ 17 peephole passes → Cleaned assembly
└───────┬───────┘
▼
┌───────────────┐
│ ELF/DWARF │ Binary encoding + DWARF v5 debug information
└───────┬───────┘
▼
┌───────────────┐
│ GNU Binutils │ as + ld → Executable, static library, or shared object
└───────────────┘
◈ Phase 1 — The Scanner
Package: scanner/ · ~10 files
The scanner converts UTF-8 source bytes into a stream of tokens. It sets the tone for Mica’s philosophy: careful, position-tracked, and fully Unicode-aware.
Token Categories
- Literals: Integer, FloatingPoint, String, Unicode character
- Operators:
+,-,*,/,=,#,<,<=,>,>= - Symbols:
(,),[,],,,:,;,.,..,:= - Keywords: Over 45 reserved words, including (as of 4.5):
program,library,function,procedure,record,array,set,pointer,address,value,if,then,else,while,do,for,to,downto,repeat,until,case,of,begin,end,leave,and,or,not,mod,odd,as,in,imp,const,var,type,packed
Numeric Literals
The scanner supports decimal, hexadecimal (0x), and binary (0b) integer
literals with optional digit separators. Floating-point literals support
scientific notation (1.0e20, 3.4028235e+38).
Comments
{ Block comments can span
multiple lines }
// Line comments run to end of line
x := 5; // Inline comments too
Position Tracking
Every token carries its source line and column. This propagates through the entire pipeline — from scanner through DWARF debug information — so error messages, runtime failures, and debugger breakpoints always point to exact source locations.
◈ Phase 2 — The Parser
Package: parser/ · ~32 files
The parser is a classic recursive descent implementation that transforms the token stream into an Abstract Syntax Tree. It handles the full Mica grammar: declarations, type expressions, all statement forms, and expressions with correct operator precedence.
Program Structure
Every Mica source file begins with either program (for executables) or
library (for reusable modules), followed by declaration sections in order:
program/library → imp → const → type → var → procedures/functions → begin...end.
Operator Precedence
| Level | Operators | Category |
|---|---|---|
| 1 (highest) | not, -, +, odd | Unary |
| 2 | *, /, mod | Multiplicative |
| 3 | +, - | Additive |
| 4 | =, #, <, <=, >, >=, in | Comparison / Membership |
| 5 | and | Logical AND (short-circuit) |
| 6 (lowest) | or | Logical OR (short-circuit) |
Statements (Complete 4.5 Surface)
| Statement | Form | Notes |
|---|---|---|
| Assignment | x := expr | Including value ptr.field := expr |
| Call | Proc(args) | Procedure or discarded function call |
| Compound | begin ... end | Sequence of statements |
| If | if cond then stmt [else stmt] | No semicolon before else |
| While | while cond do stmt | Pre-test loop |
| For | for v := init to/downto final do stmt | Control variable read-only |
| Repeat | repeat stmts until cond | Post-test, always executes once |
| Case | case sel of labels... [else stmt] end | Ordinal selector only |
| Leave | leave | Exits current procedure |
Type Expressions
| Form | Syntax | Example |
|---|---|---|
| Record | record fields end | Point = record x, y : int32; end |
| Packed record | packed record fields end | No internal padding |
| Array | array[bounds] of T | array[1..10] of float64 |
| Multi-dim array | array[b1, b2] of T | array[1..2, Red..Blue] of int32 |
| Set | set of OrdinalType | set of Digit |
| Subrange | Low..High | 1..31 |
| Enum | (Name1, Name2, ...) | (Red, Green, Blue) |
| Pointer | pointer BaseType | pointer Point |
| File | file of T | file of int32 |
Nesting Depth
Functions and procedures can be nested up to 16 levels deep. The parser tracks nesting depth and propagates it through the AST for static-link generation.
Error Recovery
The parser implements token-level synchronization to continue after syntax errors. A single compilation can report multiple diagnostics rather than stopping at the first problem.
◈ Phase 3 — The Abstract Syntax Tree
Package: ast/ · 67 files (the largest single package)
The AST is the central data structure of the compiler. It represents every syntactic construct as a typed node in a tree, and it serves as the shared currency between parsing, analysis, code generation, and export.
Node Taxonomy
The AST defines 26+ node kinds organized into five categories:
Declarations — the things a program defines: Import, Constant, Parameter, Variable, Signature, Function, DataType, RecordField
Expressions — the things a program computes: Arithmetic, UnaryArithmetic, Comparison, Logical, UnaryLogical, Memory (address-of, dereference), Conversion (casts), Selection (field / index)
Type Expressions — the shapes of data: Record, Array, Set, Subrange, Enum, File, SetConstructor
Statements — the things a program does: Assignment, Call, Leave, If, While, For, Repeat, Case, Compound
Uses — references to declared things: IdentifierUse, LiteralUse
The For, Repeat, and Case statement node kinds each carry their
semantics as typed fields: direction for For, condition for Repeat,
selector and arm list for Case.
The Visitor Pattern
The AST is traversed using the visitor pattern with double dispatch. This
separates the structure of the tree (defined once in ast/) from the
operations on it (defined in analyzer/, generator/, and other packages).
Traversal orders supported: PreOrder, InOrder, PostOrder, LevelOrder.
The Global Registry
For multi-file compilation, the AST package maintains a global registry that coordinates declarations across compilation units. Imports are validated through type fingerprinting — a deterministic digest of the type’s structure that detects ABI mismatches before linking.
Symbol Annotations
Each declaration node carries metadata that enriches through the pipeline:
- FlatName — internal name for intermediate code (e.g.,
f2.1) - GlobalName — external symbol name for library exports
- Value — compile-time constant value (when applicable)
- PassingMode — how the parameter travels at the ABI level
- NestingDepth — scope level for static-link generation
◈ Phase 4 — The Type System
Package: typesystem/ · 37 files
The type system is the backbone of Mica’s static guarantees. It describes every data type, tracks its properties, and determines what operations are legal.
Type Kinds
| Kind | Examples | Notes |
|---|---|---|
| Signed integers | int8, int16, int32, int64 | 1/2/4/8 bytes |
| Unsigned integers | uint8, uint16, uint32, uint64 | 1/2/4/8 bytes |
| Floating-point | float32, float64 | IEEE 754 |
| Boolean | bool | 1 byte |
| Unicode | unicode | 4 bytes, 32-bit code point |
| String | string | Descriptor: pointer + length, UTF-32 |
| Record | user-defined | Ordinary or packed layout |
| Array | user-defined | Fixed-size, multi-dimensional, custom index bounds |
| Set | user-defined | Over any ordinal domain |
| Subrange | user-defined | Restricted ordinal interval |
| Enumeration | user-defined | Named ordinal constants |
| Pointer | pointer T | Explicit pointer to any type |
| File | file of T | Typed file I/O |
| Function / Procedure | callable types | Parameter types, return type |
Type Capabilities
Rather than a hard-coded switch statement, Mica uses a capability bit-mask system. Each type advertises what it can participate in:
| Capability | Meaning |
|---|---|
Numeric | Supports +, -, *, / |
Ordered | Supports <, <=, >, >= |
Equality | Supports =, # |
Logical | Supports and, or, not |
Integral | Integer-specific operations |
Fractional | Float-specific operations |
Dereferenceable | Can use value to dereference |
Addressable | Can use address to take address |
Convertible | Can use as to cast |
Negatable | Supports unary - |
Callable | Can be called as a function |
Selectable | Supports .field access |
Indexable | Supports [index] access |
Ordinal | Has a discrete ordered domain (required for for, case, set) |
This design keeps the analyzer clean: “is this type valid as a for-loop
control variable?” is a single capability query, not a sprawling type switch.
ABI Awareness
The type system is ABI-aware from the ground up. Every type knows:
- Its size in bytes on the target platform
- Its alignment requirement
- How it is passed to functions (integer register, SSE register, hidden pointer)
- How it is returned from functions
- Its System V AMD64 classification: INTEGER, SSE, or MEMORY
This means function signatures tell the truth. Adding a field to a record does not silently change how it is passed. An array does not decay to a pointer. The performance model is visible in the source.
◈ Phase 5 — Semantic Analysis
Package: analyzer/ · 22 files · Package: evaluation/ · 20 files
The analyzer validates that a syntactically correct program is also semantically correct — names resolve, types match, constants fold, and every operation and statement form obeys its rules.
Eight Coordinated Passes
Pass 1 ─── StaticShallow ─────────── Register top-level declarations
Pass 2 ─── StaticImportAttach ────── Resolve imports across files
Pass 3 ─── StaticResolveDeferred ─── Resolve forward references
Pass 4 ─── StaticDeep ───────────── Full semantic validation
Pass 5 ─── Lowering ─────────────── Normalize expression forms
Pass 6 ─── TypeCoercion ─────────── Insert implicit conversions
Pass 7 ─── ConstantFolding ──────── Evaluate compile-time constants
Pass 8 ─── StaticWarn ───────────── Report unused declarations
Pass 1 — StaticShallow registers all top-level names without analyzing bodies, allowing later passes to resolve references regardless of declaration order.
Pass 2 — StaticImportAttach connects imported declarations from other compilation units, matching them to their exported symbols.
Pass 3 — StaticResolveDeferred handles forward-reference dependencies, such as mutually referential types.
Pass 4 — StaticDeep is the main validation pass. It resolves every
identifier use to its declaration, checks every operation against its operand
types, validates call argument counts and types, checks array index types,
validates set membership, detects uninitialized pointers, enforces for-loop
control-variable rules (read-only in body, correct ordinal domain, no address-of,
no outer-scope reuse), validates case labels (ordinal, constant, compatible
with selector, no duplicates), and checks until conditions are boolean. This
pass produces over 100 distinct error codes with precise diagnostic messages.
Pass 5 — Lowering normalizes expressions into canonical forms the code generator expects.
Pass 6 — TypeCoercion inserts explicit conversion nodes where implicit widening is allowed, making coercions visible and explicit in the tree.
Pass 7 — ConstantFolding evaluates constant expressions at compile time.
5 + 7 * 2 becomes the integer 19 before any code is generated. The
evaluation/ package implements the full set of compile-time arithmetic,
comparison, logical, and cast operations — including detecting division by zero,
infinity, and NaN in constant expressions.
Pass 8 — StaticWarn detects unused variables, parameters, and imports.
Quality Gates
The analyzer includes a quality gate system — internal consistency checks that verify the compiler’s own transformations are correct. These are compiler self-tests that run during development, not user-facing guards.
For-Statement Analysis
The analyzer enforces the full for contract in Pass 4:
- Control variable must be a local variable (not a parameter, constant, return variable, or outer-scope variable)
- Control variable type must be integral, enum, or subrange (not boolean, set, record, or array)
- Inside the loop body, the control variable is marked active and any assignment or address-of operation is rejected
- Nested
forloops cannot reuse an active outer control variable
Case-Statement Analysis
- Selector must have an ordinal type
- Each label must be a compile-time constant of an ordinal type compatible with the selector
- Duplicate labels are rejected
- No semicolon is permitted immediately before
else
◈ Phase 6 — Spectra: The Intermediate Language
Package: intermediate/ · 14 files · Package: generator/ · 16 files
Between the high-level AST and the low-level x86_64 assembly sits Spectra — Mica’s three-address code intermediate language. Spectra is where the program stops being a tree and becomes a flat sequence of operations that map almost directly to machine instructions.
What Three-Address Code Looks Like
Each Spectra instruction is a quadruple: an operation with up to three addresses (two inputs and one output). Here is the IL for a simple loop:
v1.1:int32 = literal 0:int32 { i := 0 }
store v1.1:int32, i
m1.2:int32 = literal 3:int32 { final bound, evaluated once }
store m1.2:int32, _final
.loop_test:
m1.3:int32 = load i
m1.4:int32 = load _final
jumpGreater m1.3:int32, m1.4:int32, .loop_exit
{ loop body here }
m1.5:int32 = load i
m1.6:int32 = literal 1:int32
m1.7:int32 = add m1.5:int32, m1.6:int32
store m1.7:int32, i
jump .loop_test
.loop_exit:
Every value has a name (like m1.3), a type annotation (like :int32), and
a clear origin. The m prefix denotes a temporary, v denotes a
variable, p denotes a parameter.
The Instruction Set
Spectra defines 37 operations across clean categories:
Arithmetic — add, subtract, multiply, divide, modulo
Unary — negate, odd
Comparison — equal, notEqual, less, lessEqual, greater, greaterEqual
Logical — and, or, not
Conversion — cast
Memory — literal, load, store
Structure — loadField, storeField
Array — loadElement, storeElement
Set — clearSet, includeSetValue, includeSetRange, containsSetValue
Control flow — jump, jumpEqual, jumpNotEqual, jumpLess, jumpLessEqual,
jumpGreater, jumpGreaterEqual, branchTarget
Functions — argument, call, loadReturn, storeReturn, prologue, epilogue
Address Design
A key architectural decision is the separation of identity from storage.
An address name like m2.3 is a lookup key into the symbol table — not a
memory location. The actual storage (which register, which stack slot) is
decided later during activation record layout and emission.
Addresses carry memory modifiers:
- Value (default) — direct access to the value
- Address — address-of
- Indirect — dereference through a pointer
And projections for structured access:
m1.1:int32 = loadField v1.1.point.x:int32 { record field }
m1.2:int64 = loadElement v2.1[]:int64, idx { array element }
For-Loop IL
The generator produces a specific pattern for for loops that reflects the
guarantee of exactly-once bound evaluation and exit-before-step terminal
safety:
{ Evaluate and save both bounds before the loop }
m_init = <initial expression>
m_final = <final expression> { evaluated exactly once, stored }
store m_init, control_var
.test:
m_cur = load control_var
{ ascending: jumpGreater cur, final, .exit }
{ descending: jumpLess cur, final, .exit }
{ body }
{ Step — only if we haven't hit the bound yet }
{ ascending: check overflow before increment }
m_next = add/subtract m_cur, 1
store m_next, control_var
jump .test
.exit:
The IL tests ILForLoopTo and ILForLoopDownto verify this pattern with
CHECK directives against the exact generated intermediate representation.
The Expression Stack — Compile-Time Slot Management
One of the more interesting internal mechanisms in the code generator is the expression stack — and the important thing to understand about it is that it exists entirely at compile time. It leaves no trace in the generated code.
When the generator traverses an expression tree — say a + b * c — it
produces a sequence of Spectra IL instructions, each yielding a temporary
result. In a naive implementation, those temporaries might be pushed and
popped on the CPU stack at runtime. Mica takes a fundamentally different
approach: every temporary result is assigned a pre-calculated, fixed slot
in the activation record during code generation, before any assembly is
emitted.
How it works:
When a sub-expression produces a result, the generator finds the
lowest-numbered slot that is not currently occupied by any live temporary —
across both the expression stack and any held scopes — and assigns that name
to the result (for example m1.1, m1.2, m1.3). The temporary is
recorded as live.
When a consumer needs that result, the temporary is popped from the compile-time stack in LIFO order and its slot number becomes available for reuse by the next temporary.
Slot reuse is what keeps activation records compact. If the temporary in
slot m1.1 is consumed before the next temporary is created, the next
temporary reuses slot 1. At any point, the number of occupied slots equals
the maximum depth of simultaneously live temporaries — not the total number
of temporaries ever produced.
For expressions that cannot be consumed in strict LIFO order — short-circuit logical operators, set constructors, call argument sequences — the generator opens a held scope. While a held scope is open, its temporaries remain marked as live and their slots are protected from reuse even after being consumed by the generating code. When the scope closes, the held temporaries return to the caller for further use.
The outcome: by the time the activation record layout is calculated, every
temporary already has a deterministic name pointing to a fixed offset below
RBP. The emitter generates mov instructions to load and store values at
those offsets. There are no runtime push/pop sequences for expression
evaluation. The CPU stack pointer moves exactly once per function call — in
the prologue — and never again.
This is a deliberate architectural choice about where complexity belongs. The expression stack ensures that complexity lives in the compiler, not in the generated code. The output is simple, predictable, and debugger-friendly.
Activation Record Layout
Before any assembly is emitted, the compiler calculates a complete activation record layout for every function:
CALLER'S FRAME
┌──────────────────────────────────────────┐
│ [rbp+16] 7th argument (stack-passed) │
│ [rbp+8] Return address │
│ [rbp+0] Saved RBP │ ◄── RBP
├──────────────────────────────────────────┤
│ [rbp-8] Static link │
│ [rbp-16] Param 1 home (rdi copy) │
│ [rbp-24] Param 2 home (rsi copy) │
│ ... │
│ [rbp-N] Local variable 1 │
│ [rbp-N-8] Local variable 2 │
│ ... │
│ [rbp-M] Temporary storage section │
│ ... │ ◄── RSP (16-byte aligned)
└──────────────────────────────────────────┘
CALLEE'S FRAME
Every offset is fixed at compile time. Same-named temporaries share a single stack slot sized to the largest type that occupies it. All section boundaries are 16-byte aligned per the ABI.
Nested Function Access
For nested functions accessing enclosing-scope variables, Spectra tracks the
scope depth of each access. A variable reference v3.2^1 means “variable
v3.2 from one scope level up.” The emitter translates this into a chain of
static-link dereferences at runtime.
◈ Phase 7 — The Emitter
Package: emitter/ · 58 files (including sub-packages)
The emitter is where Spectra IL becomes real x86_64 machine instructions. It is the largest subsystem in the compiler, because x86_64 is an intricate architecture with intricate calling conventions.
Instruction Selection
The emitter translates each IL operation into one or more x86_64 instructions.
This is not a simple 1:1 mapping: integer addition, unsigned addition, and
floating-point addition generate different instruction sequences. Signed division
requires cqo + idiv. Comparisons produce cmp followed by setCC.
Function calls must marshal arguments into the ABI-specified registers in the
correct order. The case statement generates a sequence of comparisons and
conditional jumps.
Supported x86_64 Instructions
Base x86_64:
mov, lea, push, pop, add, sub, imul, idiv, div, neg,
and, or, xor, shr, shl, sar, cmp, test,
jmp, je, jne, jl, jle, jg, jge, jb, jbe, ja, jae,
jo, jno, sete, setne, setl, setle, setg, setge,
call, ret, nop, cqo, ud2, and more
SSE2 (floating-point):
movsd, movss, addsd, addss, subsd, subss, mulsd, mulss,
divsd, divss, ucomisd, ucomiss, cvtsi2sd, cvtsi2ss,
cvttsd2si, cvttss2si, xorpd, xorps, and more
Register Allocation
The emitter uses all 16 general-purpose registers and all 16 SSE registers, following the System V AMD64 calling convention:
- Integer arguments:
rdi,rsi,rdx,rcx,r8,r9 - Float arguments:
xmm0–xmm7 - Return values:
rax/rdx(integers) orxmm0/xmm1(floats) - Caller-saved:
rax,rcx,rdx,rsi,rdi,r8–r11 - Callee-saved:
rbx,r12–r15,rbp
Checked Arithmetic
When --optimize checked is selected, the emitter inserts overflow detection
after every signed arithmetic operation. For addition, this means a jo
(jump-on-overflow) instruction branching to a runtime failure handler. The
handler prints a diagnostic with source file, line number, and operation
description — then terminates.
This is compiler-generated code, not a library flag. Every arithmetic site in the program gets the check woven in at the instruction level.
Assembly Syntax
The emitter supports both Intel and AT&T syntax from the same internal representation:
; Intel syntax (--assembly intel)
mov rax, [rbp-8]
add rax, rcx
; AT&T syntax (--assembly att)
movq -8(%rbp), %rax
addq %rcx, %rax
◈ Phase 8 — The Peephole Optimizer
Package: emitter/optimizer/ · 20 files
After the emitter produces assembly, the optimizer runs 17 peephole passes that clean up inefficiencies without changing program semantics. These are local, conservative transformations — each one provably safe and independently tested.
For a source-level walkthrough of this stage, read Peephole Optimization in the Mica Compiler. This chapter keeps the pipeline view; the dedicated optimizer article covers the pass groups, rewrite patterns, and safety rules in more detail.
| Pass | What it eliminates |
|---|---|
| Adjacent store/load | store → load pairs on the same location |
| Redundant load | Load where the value is already in a register |
| Load/store forwarding | Load replaced by a previously stored value |
| Dead store | Store overwritten before being read |
| Copy propagation | Unnecessary register-to-register moves |
| Literal propagation | Register references replaced by immediate constants |
| Arithmetic temp folding | Temporaries folded into arithmetic operands |
| Boolean temp folding | Temporaries folded into conditional set instructions |
| Compare temp folding | Temporaries folded into comparison operands |
| Compare cleanup | Redundant comparison instructions |
| Push/pop elimination | Balanced push/pop pairs |
| Stack adjustment folding | Adjacent sub rsp/add rsp combinations |
| Call argument forwarding | Intermediate moves in argument setup |
| Call return forwarding | Intermediate moves for return values |
| SSE argument forwarding | Floating-point argument passing |
| String descriptor forwarding | String parameter passing |
| Windowed store/load | Broader-window store/load pattern matching |
The optimizer tracks statistics — instructions before and after each pass, count of each pattern matched — so improvement is measurable and verifiable.
◈ Phase 9 — ELF and DWARF
Package: emitter/elf/ · ~6 files · Package: emitter/x86_64/ · ~13 files
ELF Binary Generation
The compiler produces standard ELF binaries with correct section layout:
.text— executable code.data— initialized data.rodata— read-only data (string literals, float constants).bss— zero-initialized data
Output artifacts:
- Executables — standalone programs
- Static libraries (
.aarchives) — for linking into other programs - Shared objects (
.so) — dynamically loadable libraries
DWARF v5 Debug Information
Most hobby compilers stop at binary output and omit debug information entirely. Mica generates full DWARF v5 debug information — the current standard — across four dedicated ELF sections:
.debug_info— type descriptions, variable locations, function boundaries.debug_abbrev— abbreviation tables for compact encoding.debug_str— name string table (deduplicated identifier strings).debug_line— source line mapping (statement→address table)
A Language in the Registry
Each compilation unit’s DIE carries a custom language identifier,
DW_LANG_Mica, registered in the DWARF user-defined range. GDB recognises
the source language as Mica — not C, not Pascal, not an unknown value —
because the compiler emits its own identity in every binary it produces.
Type System Fidelity
The DWARF generator traverses the full Mica type graph recursively, deduplicating entries by name and emitting a dedicated DIE for every type kind the language supports:
| Mica type | DWARF tag |
|---|---|
| Integer, Float, Boolean, Char | DW_TAG_base_type with encoding (signed, float, boolean, UTF) |
| Pointer | DW_TAG_pointer_type → value type DIE |
| Record | DW_TAG_structure_type with DW_TAG_member children |
| Array | DW_TAG_array_type with DW_TAG_subrange_type children |
| Set | DW_TAG_set_type with element type reference |
| Enum | DW_TAG_enumeration_type with DW_TAG_enumerator children |
| Subrange | DW_TAG_subrange_type with base type and closed bounds |
| File | DW_TAG_base_type (opaque handle, no encoding) |
Mica arrays carry their declared lower and upper bounds directly in the
DW_TAG_subrange_type children via DW_AT_lower_bound and
DW_AT_upper_bound, encoded in SLEB128. An array declared
array[1900..2100] appears in GDB with exactly those bounds — not
zero-based. The debugger sees the same domain the programmer declared.
Enum DIEs include every named literal as a DW_TAG_enumerator child with
its SLEB128 ordinal value, so GDB prints Red instead of 0 when you
inspect an enum variable.
Variable Location Expressions
Every variable, parameter, and constant DIE carries a DW_AT_location
expression that describes where the value lives at runtime. The expression
uses DW_OP_fbreg with a SLEB128-encoded CFA-relative offset. The CFA
(Canonical Frame Address) is defined as RSP + 16 at function entry — the
standard System V AMD64 definition — so stack unwinding and variable
location work together correctly.
Because all activation record offsets are fixed at compile time (see § Expression Stack), every location expression is a compile-time constant. There are no dynamic location expressions, no location lists, no live-range splits. The DWARF is simple, correct, and consistent with the rest of the compiler’s design.
Source Coordinates
Every DIE for a type declaration, record field, variable, parameter, or
function carries DW_AT_decl_file, DW_AT_decl_line, and
DW_AT_decl_column. These are derived from the token stream index preserved
through every compilation phase. A breakpoint set in GDB on a source line
lands on the correct instruction because the line table was built from the
same source positions the scanner assigned at Phase 1.
Subprogram Descriptors
Functions are emitted as one of three DW_TAG_subprogram variants:
procedure (no return type), function (with DW_AT_type), or entry point
(the main equivalent). Each carries DW_AT_frame_base = DW_OP_call_frame_cfa so GDB can unwind the call stack correctly. The
function-name return variable — the mechanism Mica uses for function results
— is emitted as an artificial variable (DW_AT_artificial = 1), clearly
distinguished from user-declared locals.
In Practice
Load a Mica binary in GDB. Set a breakpoint by source line. Step through statements. Inspect variables by their Mica names — enum values print as names, array indices display with their declared bounds, record fields are accessible by name. Step into nested functions and watch the static-link chain resolve. Everything works correctly because the DWARF information describes what the compiler actually generated — no approximations, no placeholder encodings.
◈ The Compiler Driver
Package: compiler/ · ~11 files
The driver orchestrates the entire pipeline: parses CLI options, manages compilation phases, handles multi-file builds, and invokes the GNU toolchain.
CLI Interface
scripts/bin/mica \
--compile --link \
--source program.mica \
--build build/output \
--optimize debug \
--assembly intel \
--stdlib scripts/bin/mica-stdlib.a
| Flag | Purpose |
|---|---|
--compile / -c | Run compilation phases |
--link / -l | Link into final binary |
--source / -s | Input source files (comma-separated) |
--build / -b | Build output directory |
--optimize / -o | Policy: debug, release, checked |
--assembly / -asm | Syntax: intel or att |
--stdlib / -sl | Standard library archive |
--platform / -pf | Target platform components |
--external / -ext | Link profile: static or shared |
--export / -exp | Export intermediate representations |
--no-pie / -np | Disable position-independent executable |
Phased Concurrency
The compiler driver uses a phased parallel execution model. Within each phase, all compilation units run concurrently. Between phases, the driver inserts a barrier — ensuring, for example, that all files complete parsing before semantic analysis begins, because cross-file import resolution requires all declarations to be registered first.
The phases and their execution mode in 4.5:
| Phase | Execution | Notes |
|---|---|---|
parse | Parallel | Each source file is parsed independently |
analyze — StaticShallow | Parallel | Each unit analyzed independently in this pass |
analyze — StaticImportAttach / Deep / Warn | Sequential | Require shared global registry state |
publish | Sequential | Mutates shared global registry |
generate | Sequential | Requires stable shared registry |
controlFlow | Parallel | CFG built per unit independently |
emit | Parallel | Assembly generated per unit independently |
persistAssemblyCode | Parallel | Write .s files concurrently |
export | Parallel | Export IL/AST/binary artifacts concurrently |
Persist and export complete the end-of-pipeline parallel picture. Writing assembly files to disk and exporting intermediate representations are independent per-compilation-unit operations with no reason to run sequentially.
Lock-free by design. The parallel() helper at the core of this
system spawns one goroutine per compilation unit and waits for all of them
with a single sync.WaitGroup. It needs no mutex, no channel, no lock
for data access, because each goroutine writes exclusively to its own
pre-allocated index slot in a result slice. Reading from shared immutable
data (the global registry, type registry, interop contracts) is safe without
locks because those structures are read-only during parallel execution.
Worker panics are captured, re-thrown in deterministic input order on the
caller goroutine after all workers complete.
The result is a pipeline that scales linearly with the number of source files in the parallel phases, and has no synchronization overhead beyond the WaitGroup itself.
Multi-File Compilation
scripts/bin/mica --compile --link \
--source main.mica,utilities.mica \
--build build/app \
--stdlib scripts/bin/mica-stdlib.a
Each file is a separate compilation unit with its own namespace. Files
communicate through imp declarations and contract-based resolution.
◈ C Interoperability and Contracts
Package: interop/ · ~7 files
Mica calls C libraries and can be called from C. This interoperability is built on a disciplined JSON contract system, not ad-hoc FFI.
JSON-Based Library Contracts
Every external library is described by a JSON contract that specifies its types, functions, calling conventions, and ABI classification. The compiler ships with embedded contracts for:
| Contract | Contents |
|---|---|
std | Mica standard I/O: WriteLn, ReadLn, Write, Empty |
process | Program arguments: ProgramName, ArgCount, Arg |
limits | Numeric bounds: MaxInt8, MinInt32, MaxFloat64, etc. |
cstd | C standard library functions |
math | Mathematical functions: Sin, Cos, Sqrt, Pow, etc. |
The limits Contract
Instead of injecting numeric bounds as hardcoded constants from Go code,
Mica 4.5 exposes them through the limits embedded contract:
imp
* : limits;
var
x : int64;
begin
x := MaxInt64;
WriteLn("%lld", x);
end.
This makes the constant surface consistent and data-driven from the same JSON contract model used for all other library imports.
Type Fingerprinting
When two compilation units share a type definition, Mica computes a deterministic fingerprint (digest) for the type. Fingerprint mismatches — caused by field changes, type layout differences, or packing differences — are caught at compile time with a clear error, not at runtime with a crash.
Format String Analysis
When you call WriteLn with format specifiers like %d or %lld, the
compiler validates the format string against the actual argument types at
compile time. A mismatch is a compiler error, not a runtime surprise.
Real-World Interop Examples
- MicaCallsC — Mica calling C
printf,sin,cos,pow - CCallsMica — C code calling Mica-compiled functions
- MicaCallsMica — Multi-file Mica projects with cross-unit imports
- MicaCallsLinux — Direct Linux system calls via contracts
◈ The Standard Library
Mica’s standard library is a C23 implementation in
interop/standard/mica-stdlib.c. It provides:
I/O functions — WriteLn, ReadLn, Write: delegate to the appropriate
UTF-32 or UTF-8 C library calls transparently based on the target encoding.
Process functions — ProgramName, ArgCount, Arg: access program
arguments without exposing C-style argc/argv in the language entry point.
String runtime — UTF-32 string handling and descriptor-based string management (pointer + length + capacity).
Runtime support — Static link creation and traversal for nested functions, runtime failure reporting with source context for checked arithmetic.
The library compiles to mica-stdlib.a and is linked into every Mica binary.
Automatic UTF-8 / UTF-32 Argument Passing
One of the subtler things Mica does for you — and one that developer-facing languages frequently get wrong — is handle the string encoding boundary between the language and the platform completely automatically.
The platform is a mix. The Linux kernel and most C APIs use UTF-8 narrow strings. But Mica itself works in UTF-32 as its primary string encoding. These two worlds have to talk to each other at function boundaries, and doing it wrong means either silent data corruption or a layer of explicit conversion code in every program.
Mica’s contract system solves this at the compiler level.
When the type registry initializes, it is told the target platform’s string encoding (UTF-32 for the default Linux target). The contract resolver uses this encoding flag to select the correct external symbol for every library function that handles strings:
| Library function | UTF-32 → resolves to | UTF-8 → resolves to |
|---|---|---|
WriteLn | wprintf | printf |
ReadLn | wscanf | scanf |
Write | fwprintf | fprintf |
This selection happens inside the compiler during contract resolution.
Your Mica source always says WriteLn. The compiled binary calls the right
external symbol for the platform’s encoding. You never write wprintf or
printf yourself.
{ This Mica code is identical for UTF-32 and UTF-8 targets.
The compiler selects the right symbol automatically. }
program HelloEncoding;
imp
WriteLn : std;
begin
WriteLn("Grüße aus Mica — 日本語 — Ελληνικά");
end.
The process library follows the same pattern. ProcessArguments and
ProcessArgumentsUtf8 are separate tests precisely to verify that both
encoding paths work correctly for Unicode argument values. Your application
code is the same in both cases; only the target encoding contract changes.
This design means that porting Mica to a UTF-8-first target — or supporting a narrow-string execution mode for embedded work — requires no changes to Mica source code. The encoding boundary is in the contracts, not in the programs.
The PascalCase API
The public Mica callable surface follows a consistent naming rule:
| Convention | Applies to |
|---|---|
| lowercase | Keywords, library namespace names (std, math, process) |
PascalCase | All callable library API symbols (WriteLn, ArgCount, Sin) |
This rule makes the source surface self-consistent and readable. Old lowercase spellings are not present in the shipped Mica source artifacts.
String Encoding
Mica uses UTF-32 as its default string encoding. String literals are
stored as zero-terminated UTF-32 sequences in read-only data. The contract
system resolves the correct encoding-specific external names transparently —
wprintf for UTF-32, printf for UTF-8 — so you never see this detail in
Mica source code.
◈ The Test Harness
Package: tests/ · 538 test cases · 27 Go implementation files
The test harness is one of the most impressive parts of the project. It is a first-class development tool that has driven every feature from day one. No feature lands without harness coverage.
Test Categories
| Category | Count | What it validates |
|---|---|---|
| execution | 266 | Full pipeline: compile → link → run → check stdout and exit code |
| errors | 200 | Compiler error messages and diagnostics |
| il | 49 | Spectra intermediate language output |
| asm | 23 | Generated x86_64 assembly instructions |
| Total | 538 |
Manifest-Driven Testing
Every test is defined by a test.json manifest:
{
"name": "ForIntegralLoops",
"category": "execution",
"sources": ["ForIntegralLoops.mica"],
"expect": {
"stdout": "expected/stdout.txt",
"exit_code": 0
}
}
Adding a test is: write the Mica source, write the expected output, create the manifest. Done.
Test Variants
A single test can run under multiple compiler configurations:
{
"name": "CheckedArithmeticPolicyDebugRelease",
"category": "execution",
"compiler": { "optimize": ["debug"] },
"expect": { "stdout": "expected/stdout.release.txt", "exit_code": 0 },
"variants": [
{
"name": "checked",
"compiler": { "optimize": ["checked"] },
"expect": { "stdout": "expected/stdout.checked.txt", "exit_code": 1 }
},
{
"name": "release",
"compiler": { "optimize": ["release"] },
"expect": { "stdout": "expected/stdout.release.txt", "exit_code": 0 }
}
]
}
Same source, different flags, independently verified. This is how the harness
proves that checked arithmetic detects overflow in checked mode while silently
wrapping in debug and release modes.
CHECK Directive Matching
For IL and assembly tests where output contains variable content (register numbers, memory offsets), the harness uses CHECK directives — regex-based pattern matching:
CHECK: (?m)^\s+mov\s+dword ptr \[rbp-\d+\], 1\b
CHECK: (?m)^\s+add\s+r10, r11\b
CHECK-NOT: (?m)^\s+call\s+rt\.runtime_failure\b
CHECK: lines must match. CHECK-NOT: lines must not match. Similar to
LLVM’s FileCheck, adapted for Mica’s needs.
Stress Testing
Stress tests ("tags": ["stress"]) are excluded by default, enabled with -stress:
| Test | What it tests |
|---|---|
| StressLOC1k | Compile a 1,000-line generated program |
| StressLOC10k | 10,000 lines |
| StressLOC100k | 100,000 lines — takes 6.687 s |
| StructABI1MBStress | 1 MB aggregate passing through the ABI |
Test Runner CLI
# All tests
go run ./tests/cmd/mica-test run
# With stress tests
go run ./tests/cmd/mica-test run -stress
# Filter by name
go run ./tests/cmd/mica-test run -filter For
# Filter by category
go run ./tests/cmd/mica-test run -category il
# Update expected output files
go run ./tests/cmd/mica-test update
# List tests without running
go run ./tests/cmd/mica-test list
# CPU profiling
go run ./tests/cmd/mica-test run -cpuprofile profile.out
What the Tests Cover
Control flow:
ForIntegralLoops, ForSubrangeLoop, ForEnumLoop, ForBoundsEvaluatedOnce,
ForTerminalBoundSafety, ForLeave,
RepeatSimple, RepeatExecutesOnce, RepeatLeave,
CaseIntegerElse, CaseSubrangeSelector, CaseEnumMatch,
CaseImplicitOrdinalPromotion
Process library:
ProcessArguments, ProcessArgumentsUtf8
For-loop error enforcement:
ForBodyAssignmentRejected, ForBodyAddressTakeRejected,
ForControlVariableParameterRejected, ForNestedReuseRejected,
ForControlVariableBooleanRejected, ForControlVariableConstantRejected,
ForMissingToOrDownto, ForMissingDo, and more
Case error enforcement:
CaseSelectorNonOrdinal, CaseDuplicateLabel, CaseLabelMustBeConstant,
CaseLabelIncompatibleWithSelector, CaseSemicolonBeforeElseCompound,
CaseLabelRangeUnsupported, and more
Arithmetic and types:
ArithmeticAdd/Subtract/Multiply/Divide across all 10 numeric types,
Booleans, Characters, Strings, StringFormatFloat, StringFormatInt
Functions:
FunctionRecursive, FunctionNested, FunctionMultipleReturnPaths,
AllTypesFunction
Aggregates:
StructFieldReadWrite, StructNestedField, StructPassReturnByValue,
ArrayPassReturnByValueMatrix, ArrayMultidimensional,
PackedRecordFieldReadWrite, PackedArrayIndexing,
SetPassReturnMembership, SetConstructorRange,
PascalFoundationComplexCombinations
Memory:
PointerArithmetic, PointerAssignment, PointerDereferenceChain
Safety:
CheckedArithmeticPolicyDebugRelease, CheckedCastFloatToIntPolicies
Multi-file:
CrossCuImportFunction, CrossCuImportDataType,
CrossCuImportDataTypeFingerprintMismatch
Error diagnostics:
MissingEnd, DuplicateIdentifier, ParamCountMismatch,
UninitializedPointer, ConstDivByZero, ConstNaN, ConstInf,
InteropContractLayoutMismatch, and ~170 others
◈ Package Architecture
Here is the complete package layout with primary responsibilities:
mica-compiler/
│
├── main.go Entry point and CLI
│
├── scanner/ (~10 files) Lexical analysis
│ ├── scanner_impl.go Core scanning logic
│ ├── identifier_impl.go Identifier tokenization
│ ├── literal_impl.go String/char literals
│ ├── number_impl.go Numeric literals
│ └── position_impl.go Source position tracking
│
├── token/ (~5 files) Token type definitions
│
├── parser/ (~32 files) Recursive descent parser
│
├── ast/ (67 files) Abstract syntax tree
│ ├── ast.go Node interface and kinds
│ ├── block_impl.go Block/scope structure
│ ├── declaration.go Declaration nodes
│ ├── expression.go Expression nodes
│ ├── statement.go Statement nodes (inc. For, Repeat, Case)
│ ├── global_registry.go Cross-unit coordination
│ └── visitor.go Traversal infrastructure
│
├── typesystem/ (37 files) Type system and ABI
│ ├── primitive.go Built-in primitive types
│ ├── structure.go Record types
│ ├── array.go Array types
│ ├── set.go Set types
│ ├── subrange.go Subrange types
│ ├── enum.go Enumeration types
│ ├── pointer.go Pointer types
│ └── type_registry_impl.go ABI classification
│
├── symbols/ (~3 files) Symbol table
│
├── analyzer/ (22 files) Semantic analysis
│ ├── static_analysis_*_impl.go 8 validation passes
│ ├── constant_folding_impl.go Compile-time evaluation
│ ├── lowering_impl.go Expression normalization
│ ├── type_coercion_impl.go Implicit conversions
│ └── quality_gate_impl.go Internal consistency verification
│
├── evaluation/ (20 files) Compile-time expression evaluator
│
├── intermediate/ (~14 files) Spectra IL
│ ├── intermediate.go TAC instruction definitions
│ ├── spectra_impl.go Human-readable text format
│ ├── symbol_table.go IL symbol table
│ └── activation_record_layout_impl.go Stack frame calculation
│
├── generator/ (16 files) AST → Spectra IL
│
├── cfg/ (~3 files) Control flow graph
│
├── emitter/ (~19 files) Spectra IL → x86_64 assembly
│ ├── arithmetic_impl.go Arithmetic instruction selection
│ ├── comparison_impl.go Comparison and branch instructions
│ ├── conversion_impl.go Runtime type conversions
│ ├── aggregate_impl.go Struct and array emission
│ ├── function_call_impl.go Call site emission and ABI marshaling
│ ├── prologue_epilogue_impl.go Frame setup and teardown
│ └── runtime_failure_impl.go Checked-arithmetic failure handlers
│
│ ├── x86_64/ (~13 files) x86_64 instruction model
│ │ ├── x86_64.go Instructions, registers, addressing
│ │ └── debug_info_impl.go DWARF v5 attribute generation
│ │
│ ├── elf/ (~6 files) ELF binary encoding
│ │ ├── elf.go Section layout and symbol tables
│ │ └── dwarf_impl.go DWARF v5 section encoding
│ │
│ └── optimizer/ (~20 files) Peephole optimizer
│ └── *_impl.go 17 independent optimization passes
│
├── compiler/ (~11 files) Compiler driver
│ ├── driver_impl.go Pipeline orchestration
│ ├── compilation_unit_impl.go Per-file compilation
│ ├── binary_unit_impl.go Binary generation and linking
│ └── concurrency_impl.go Parallel phase execution
│
├── interop/ (~7 files) C interoperability
│ ├── resolver_impl.go Contract symbol resolution
│ ├── default_library_contracts.json Embedded contracts (std, process,
│ │ limits, cstd, math)
│ ├── library_contracts.schema.json Contract JSON schema
│ └── standard/mica-stdlib.c C23 runtime implementation
│
├── platform/ (~5 files) Target platform abstraction
│ ├── platform.go OS, ISA, ABI definitions
│ └── abi_impl.go System V AMD64 and AAPCS64
│
├── errors/ (~3 files) Diagnostic system
├── debugging/ (~2 files) Debug information helpers
├── export/ (~2 files) IR export (JSON/text)
├── collection/ (~4 files) Utility data structures
│
├── tests/ Test harness
│ ├── cmd/mica-test/ Test runner CLI
│ ├── harness/ (35 files) Framework: discovery, execution,
│ │ CHECK matching, variant dispatch
│ └── cases/
│ ├── execution/ (266 tests)
│ ├── errors/ (200 tests)
│ ├── il/ (49 tests)
│ └── asm/ (23 tests)
│
├── examples/ Working example programs
│ ├── Playground/ General experimentation
│ ├── Cast/ Type casting and conversions
│ ├── ConstantFolding/ Compile-time constant evaluation
│ ├── MathPower/ Math library interop (pow, sin, cos)
│ ├── Nesting/ Nested procedures and lexical scoping
│ ├── ReadLnUsage/ Console input examples
│ ├── ShortCircuit/ Short-circuit boolean evaluation
│ ├── UtfSources/ UTF-32 and Unicode source examples
│ ├── Utilities/ Multi-file library example
│ ├── MicaCallsC/ Mica calling C libraries
│ ├── CCallsMica/ C calling Mica functions
│ ├── MicaCallsMica/ Multi-file Mica projects
│ └── MicaCallsLinux/ Direct Linux syscall contracts
│
├── scripts/
│ ├── build-compiler.sh Build script
│ └── bin/ Compiled binaries
│
└── .backlog/
├── current/ Active backlog items
├── incubation/ Research and future ideas
└── archive/ Completed and closed items
◈ Building the Compiler
# Build the compiler and standard library
scripts/build-compiler.sh
# Produces:
# scripts/bin/mica — compiler binary
# scripts/bin/mica-stdlib.a — standard library archive
Compiling a Mica Program
scripts/bin/mica \
--compile --link \
--source examples/Playground/Playground.mica \
--build build/playground \
--optimize debug \
--assembly intel \
--stdlib scripts/bin/mica-stdlib.a
Running the Full Test Suite
# All tests
go run ./tests/cmd/mica-test run
# All tests including stress
go run ./tests/cmd/mica-test run -stress
# Filtered
go run ./tests/cmd/mica-test run -filter Case
go run ./tests/cmd/mica-test run -filter Repeat
go run ./tests/cmd/mica-test run -category il
◈ Roadmap
Horizon 1 — Version 4.5 (March 2026) ✓ Released
The 4.5 release theme: finish the release-facing compiler story without rewriting the compiler.
| Work item | Status |
|---|---|
for loops (to and downto) | ✓ Done |
repeat ... until (post-test loop) | ✓ Done |
Minimal ordinal case statement | ✓ Done |
| Public library API reset to PascalCase | ✓ Done |
process library for program arguments | ✓ Done |
| Driver-side concurrent assembly persist/export | ✓ Done |
| Full and stress harness pass | ✓ Done |
4.5 closure evidence:
scripts/build-compiler.shpassesgo run ./tests/cmd/mica-test runpassesgo run ./tests/cmd/mica-test run -stresspasses- StressLOC100k: 6.687 s · StressLOC10k: 718 ms · StressLOC1k: 142 ms
What 4.5 does not include by design: with, heap features, extended case
label lists and ranges, SSA and optimizer overhaul, variant records,
conformant arrays, defer, new/dispose.
Horizon 2 — Version 4.6 (Q2–Q4 2026) — Active
Text runtime and string direction:
UTF-32-first string model as the canonical Mica string type; stringpart
(borrowed text ranges for tokenization/substring without allocation);
stringbuffer (owned growable UTF-32 text for append and formatting);
cstring (explicit narrow UTF-8 boundary for C APIs); stringpool (arena
storage for temporary text).
Standard library modernization: A modern, Mica-native library surface; compiler-emitted JSON contracts for Mica libraries so published and imported surfaces use the same artifact; curated external contract files for well-known C libraries; first POSIX and Linux API access through contracts.
Language ergonomics:
Variable initialization expressions (var x : int32 := 42;);
checked_bounds for subranges and array indices; panic and assert;
explicit heap allocation with new and dispose; deterministic cleanup
with defer dispose.
Optimizer infrastructure: SSA transformation; dominators; liveness, alias, and loop analysis; register allocation; broader global optimization passes.
Extended control flow:
case label lists and label ranges (deferred from 4.5); richer for forms.
Horizon 3 — 2027+ (AI Track)
This is the strategic direction that keeps Mica distinctive:
- Concurrency: Structured lexical workers built around nested procedures; coroutines and generators; async I/O after concurrency semantics are stable; atomics and a documented memory model
- Mathematical types: Fixed-shape vectors and matrices as built-in types; compile-time shape and dimension checking; mixed-precision numeric types; compiler-native automatic differentiation
- Hardware: CPU SIMD lowering for vector/matrix operations; GPU offload and accelerator lowering after CPU vector semantics are solid
- Platform: Linux AArch64 as the next architecture target (AAPCS64 already in the platform layer); macOS deferred (requires separate Mach-O toolchain)
- Ecosystem: AI library contracts for BLAS, cuBLAS, oneDNN; interactive REPL for numerical workflows
◈ For Students, Professors, and Enthusiasts
If you are interested in compiler construction, Mica is an unusually clear learning resource:
- Complete pipeline — scanner through ELF binary in one repository, one implementation language, zero external dependencies
- Real target — produces actual x86_64 ELF binaries, not bytecode for a VM
- Real debug information — DWARF v5, usable with GDB right now
- Clean, readable Go — the implementation language is approachable and consistently structured
- Rich test suite — tests that serve as both specification and proof
- Inspectable IL — Spectra is a human-readable intermediate language you can study and export for any program you write
The package structure maps cleanly to a compiler textbook:
| Package | Textbook chapter |
|---|---|
scanner/ | Lexical analysis |
parser/ | Syntax analysis |
ast/ | Intermediate representations |
typesystem/ | Type systems |
analyzer/ | Semantic analysis |
intermediate/ + generator/ | Code generation |
emitter/ | Target code generation |
emitter/optimizer/ | Optimization |
emitter/elf/ | Object file formats |
The difference between Mica and a textbook toy: Mica compiles real programs, produces real ELF binaries, implements the full System V AMD64 ABI, generates DWARF v5 debug information, and has many tests that prove it all works. The problems textbooks wave away — nested function activation records, aggregate passing by value across function boundaries, ABI-correct structure layout, packed type semantics — are all solved here and all tested.
◈ Project History
| Date | Milestone |
|---|---|
| December 2023 | First commit |
| 2024 | Scanner, parser, AST, type system, initial code generation |
| 2025 | Semantic analysis hardening, full aggregate types (packed records/arrays, sets, enums, subranges), interop contracts, DWARF v5 |
| January 2026 | Version 4.0.0 |
| March 2026 | Version 4.5.0 — first public release, many tests, for/repeat/case, process library, PascalCase API |
2.5 years of consistent development, one language, one compiler, one goal: a clean, teachable, systems-capable compiler that tells the truth.
◈ License
The Mica compiler source code is available under the MCL-1.0 (Mica Compiler Non-Commercial License):
- Free for personal learning, private projects, academic research, and teaching
- Commercial use requires a separate written license agreement
- Contact: info@mica-dev.com
Mica is being built in the open because compilers deserve to be understood — not just used. If you have read this far, you are the kind of person this project was built for.