Version 4.5 · Released March 2026

Structured clarity. C control. Born for AI.


A note on accuracy

Mica is developed by a single engineer working with AI assistance. This portrait represents the best current description of the language and compiler — but it is a living document. Specific names, API spellings, library namespaces, and feature boundaries will change as implementation reveals what actually works. Inaccuracies and inconsistencies can and do occur.

What will not change is the direction and the vision. The platform philosophy, the commitment to explicit semantics, the structured-language foundation, and the AI trajectory are stable. Everything else is subject to revision as the compiler grows.

Mica 4.5 is the first public release. The compiler is verified by a test harness spanning 576 Mica programs, but as with any first release, unexpected limitations may still be encountered. The compiler, the VS Code extension, and the tutorials are under intensive development throughout 2026 — changes can and will occur. The primary goal for 2026 is a consistent, well-designed Mica core language with a complete standard library. Known gaps — including heap memory allocation, a string library, and further standard library coverage — will be addressed in the course of this work.


◈ What Is Mica?

Mica is a systems programming language designed for readability, explicit control, and a long-range trajectory toward AI-native compiler semantics. It brings clean, structured syntax together with the low-level control that systems programmers need — without hiding the machine behind abstractions you didn’t ask for.

The Mica compiler is written in pure Go with zero external dependencies. It compiles Mica source code all the way down to native Linux x86_64 ELF binaries with DWARF v5 debug information, using only the GNU assembler and linker as final tools. There is no LLVM. There is no GCC backend. Every phase of compilation — from lexical analysis through register-level peephole optimization — is implemented from scratch inside this single repository.

After two and a half years and 1,844 commits of quiet, disciplined development, Mica 4.5 is the first public release: a real compiler that produces real binaries you can run, inspect in GDB, and call from C.


◈ At a Glance

MetricValue
Implementation languageGo 1.26, pure standard library
Compiler Go source files322
Compiler gross lines (Go, incl. comments & whitespace)75,545
Test harness Go implementation files27
Test harness gross lines (Go)5,404
Mica test programs (.mica files)576
Mica test program gross lines151,958
Standard library (C23, mica-stdlib.c)778 lines
Assembly runtime routines4
Assembly runtime instructions~31
Total project gross lines≈ 234,000
Packages25
Test cases538 (execution, errors, IL, assembly)
Execution tests266
Error tests200
IL tests49
ASM tests23
Total commits1,844
First commitDecember 27, 2023
Current version4.5.0
Target platformLinux x86_64, System V AMD64 ABI
Debug formatDWARF v5
Binary formatELF
External toolchainGNU assembler (as) + GNU linker (ld)
LicenseMCL-1.0 (non-commercial; commercial licensing available)

Stress test timings (StressLOC series):

TestTime
StressLOC100k (100,000-line program)6.687 s
StressLOC10k718 ms
StressLOC1k142 ms

Assembly runtime — Mica emits a small set of hand-written x86_64 routines directly into each compiled binary. The current four routines (~31 instructions total) handle static link traversal for nested procedures, byte-level string comparison via repe cmpsb, and the non-returning runtime failure path. They are deliberately minimal: each routine does exactly one well-defined job, carries precise register documentation, and stays out of the way of the C standard library. The set will grow as the language grows — heap allocation, bounds checking, and concurrency primitives are natural additions — but the design principle remains the same: keep the runtime small, keep it correct, and let the platform handle everything it already does well.


◈ Why Mica Exists

There is a space between languages that is surprisingly empty.

Some languages are easy to read but stay far from the machine. Others give you raw control but demand constant vigilance against misuse. Few bridge both worlds cleanly — and fewer still have a credible direction for what programming should look like in an AI-shaped future.

Mica occupies that space deliberately:

DimensionWhat Mica deliversWhy it matters
ReadabilityClean block structure, nested functions, records, arrays, sets, ordinal types, structured control flowCode that communicates intent, teachable from day one, analyzable without context
ControlExplicit pointers, explicit address/dereference, direct ABI compatibility, contract-based interopSystems behavior is predictable; FFI cost is visible, never hidden
DirectionCompiler-driven optimization, mathematical notation, tensor-native semantics (planned)A language shaped toward AI and numerical computing from the start, not retrofitted after the fact

Mica is not trying to become an object-oriented language with classes and inheritance. Mica is not building a package-manager ecosystem. Abstraction in Mica comes from records, procedures, nested functions, and contract-defined interfaces — proven structures that stay readable at any scale.


◈ The Platform Vision

Mica does not build a private ecosystem. It opens the one that already exists.

The world already has decades of proven C libraries, POSIX interfaces, Linux kernel APIs, and battle-tested system runtimes. They are not going anywhere. Every HTTP client, every database driver, every numerical toolkit, every operating system interface — already written, already tested, already running on billions of machines. Mica’s goal is not to replace any of that. It is to make all of it immediately and safely accessible from structured, readable source code.

Most new languages ask developers to wait while an ecosystem is built from scratch. Mica takes the opposite position: the ecosystem already exists. It is the entire C ABI world — and Mica is designed to reach every part of it directly.

The JSON contract system is the mechanism. Every C library surface, every POSIX API, every Linux syscall that follows a stable ABI can be described in a contract file and reached from Mica source with full type checking, format-string validation, and compile-time safety. A Mica binary and a C library binary are the same kind of object. They link together directly under the System V AMD64 ABI. There is no adapter layer. There is no runtime bridge. There is no reimplementation required.

This principle shapes every architectural decision in Mica:

  • The standard library grows by surfacing what already exists, not by reimplementing it in a closed language-specific world.
  • WriteLn and ReadLn are not new I/O systems — they resolve directly to wprintf and wscanf (or printf/scanf on UTF-8 platforms). The compiler selects the right platform function; the developer writes one clean call.
  • The planned posix and linux contract packs will give Mica programs access to the full Linux OS API surface without leaving the language.
  • C interop is first-class, not an escape hatch. A contract file is how you reach any C library — whether it ships with Mica, was authored by a third party, or was written ten years before Mica existed.
  • Mica programs and C programs share one binary world. They call each other, link together, and cooperate under the same ABI with no performance tax at the boundary.

Two languages sharing a platform are not competitors. A Mica binary and a C binary can be linked into the same executable. A Mica library can be called from C. A C library can be called from Mica. The platform is the ecosystem — and Mica belongs to it without trying to displace it.

The measure of success is not how many libraries Mica ships. It is how quickly a developer can reach any library that already exists — including libraries written years from now — with the same structured, safe, readable source model that defines the rest of the language.

Mica augments the platform. It does not compete with it.


◈ A First Look at Mica

The best way to introduce a language is to show it. Every snippet below is real, compiled code from the 4.5 test suite.

The language reads like structured intent

type
    Color = (Red, Green, Blue);
var
    c : Color;
begin
    c := Green;
    case c of
        Red:   WriteLn("red");
        Green: WriteLn("green");
        Blue:  WriteLn("blue");
    end;
end.

No integer casts. No fall-through hazards. No magic numbers. The source says exactly what the program means — and the compiler enforces the domain. A Color variable cannot accidentally hold an integer, and an integer cannot silently become a Color.

Enumerations carry their domain into arrays

type
    Month   = (Jan, Feb, Mar, Apr, May, Jun,
               Jul, Aug, Sep, Oct, Nov, Dec);
    Calendar = array[Month] of int32;
var
    days : Calendar;
begin
    days[Jan] := 31;
    days[Feb] := 28;
    days[Mar] := 31;
end.

The array is indexed by Month values, not integers. The compiler enforces domain correctness at every index site. There is no invisible month - 1 translation, no risk of an off-by-one born from a zero-based convention, and no way to index a calendar with an unrelated integer.

Every pointer operation is spelled out

procedure Scale(p : pointer Point; factor : float64);
begin
    p.x := p.x * factor;
    p.y := p.y * factor;
end;

pointer Point in the signature is an unambiguous statement: Scale modifies the caller’s record. Not T*, not an implicit reference — a readable description of intent, visible at every call site. Adding a field to Point does not silently change the calling convention. Passing a Point by value requires writing p : Point, nothing more.

Notice that p.x works directly — no explicit dereference required. When the base of a field selection chain is a pointer to a record, the compiler lowers the dereference automatically during semantic analysis. The explicit form value p.x is also valid and produces identical machine code. More on this in the semantic analysis chapter.

Imports are platform calls, not a private ecosystem

imp
    WriteLn : std;
    Sin     : math;

WriteLn resolves to wprintf. Sin is libm. The names are clean and readable, but nothing is invented underneath — the compiled binary calls the exact same functions every C program on the system uses. There is no separate I/O runtime, no reimplemented math library, no translation layer. The import statement names a library and a contract; the platform provides the rest.


◈ What Makes This Compiler Different

Zero external dependencies. The Mica compiler is written in pure Go with no external libraries. go build ./... is all you need. The standard library links against as and ld and nothing else. This is an intentional discipline: the compiler should be understandable, buildable, and modifiable by any engineer with a Go toolchain — on any machine, without first assembling a dependency graph.

Every stage is inspectable. Every intermediate representation can be exported and examined: the token stream from the scanner, the AST from the parser, human-readable Spectra IL from the generator, x86_64 assembly in Intel or AT&T syntax, and the final ELF binary with DWARF v5 debug information. This makes Mica not just a compiler but a teaching instrument. You can watch a program transform step by step from source to machine code and read exactly what happened at each stage.

Harness-first development. The 538 test cases are not regression tests — they are the specification of what the compiler can do. No feature lands without harness coverage across execution, error, IL, and assembly tests, with variants for different compiler configurations. This discipline has kept the compiler honest across 1,844 commits and two and a half years of development.

ABI truthfulness. The type system knows exactly how every type is classified under the System V AMD64 ABI — INTEGER, SSE, or MEMORY. The function call emitter uses that classification directly. The DWARF debug information describes what was actually generated. When you step through a Mica program in GDB, the stack frames look exactly like what the ABI specification says they should — because they are built that way from the ground up, not adjusted afterward.


◈ What Mica Looks Like

All examples below are real, compiled programs from the 4.5 test suite.

Hello, Mica

program ConstExpression;

imp
    WriteLn : std;

const
    Sum        = 5 + 7 * 2;
    Difference = (20 - 3) * 2;
    Product    = (6 as int64) * 7;
    Ratio      = 9.0 / 2.0;
    Check      = (Sum = 19) and (Difference > 30);

var
    i32  : int32;
    i64  : int64;
    f64  : float64;
    flag : bool;

begin
    i32 := Sum;
    WriteLn("  Sum = %d", i32);

    i64 := Product;
    WriteLn("  Product = %lld", i64);

    f64 := Ratio;
    WriteLn("  Ratio = %.2lf", f64);

    flag := Check;
    WriteLn("  Check = %hhu", flag);
end.

This is a complete, compilable Mica program. The structure is clean and readable — program, const, var, beginend. — alongside C-level type specificity (int32, int64, float64) and familiar format-string I/O. The standard library function name WriteLn follows Mica’s PascalCase naming convention for callable library symbols.

Constant expressions like 5 + 7 * 2 are evaluated entirely at compile time. The variable Sum never generates a runtime computation.

Recursive Functions

function Factorial(n : int32) : int64;
begin
    if n <= 1 then
        Factorial := 1
    else
        Factorial := (n as int64) * Factorial(n - 1);
end;

Functions return values by assigning to their own name. The as keyword performs explicit type conversion. This is real recursive code generating real call frames, traced correctly by DWARF and GDB.

Nested Functions with Lexical Scoping

function Outer(base : int32) : int32;
var
    offset : int32;

    function Inner(input : int32) : int32;
    begin
        Inner := input + offset;
    end;
begin
    offset := 10;
    Outer := Inner(base);
end;

Inner reads offset from its enclosing scope. The compiler implements this through a static link chain in activation records — a real frame-pointer chain that GDB can follow and that the test suite verifies at every nesting level.

The for Loop

program ForIntegralLoops;

imp
    WriteLn : std;

var
    i   : int32;
    sum : int32;

begin
    sum := 0;
    for i := 0 to 3 do
        sum := sum + i;

    WriteLn("ascending sum = %d", sum);

    for i := 3 downto 0 do
        WriteLn("i = %d", i);
end.

The for loop supports to (ascending) and downto (descending) directions. Loop bounds are evaluated exactly once before the first iteration — verified by a dedicated test ForBoundsEvaluatedOnce that counts function call invocations. The control variable is read-only inside the loop body: direct assignment, address-of, and outer-scope access are all compile-time errors.

The loop also handles terminal-bound safety correctly. When the control variable reaches the final bound and would need to overflow to continue, Mica exits before the step — making for i := MaxInt32 - 1 to MaxInt32 safe with exactly two iterations.

{ For loops work with enum control variables too }
type
    Color = (Red, Green, Blue);
var
    c : Color;
begin
    for c := Red to Blue do
    begin
        if c = Red   then WriteLn("color = Red");
        if c = Green then WriteLn("color = Green");
        if c = Blue  then WriteLn("color = Blue");
    end;
end.

Subrange and enum types are fully supported as control variable types. The loop domain is checked at compile time: boolean and set types are rejected with clear error messages.

The repeat ... until Loop

program RepeatSimple;

imp
    WriteLn : std;

var
    i : int32;

begin
    i := 1;

    repeat
        WriteLn("i = %d", i);
        i := i + 1;
    until i > 3;

    WriteLn("done, i = %d", i);
end.

repeat ... until is the post-test loop: the body always executes at least once, and execution continues until the condition becomes True. This is the natural companion to while — and its semantics are verified by a dedicated RepeatExecutesOnce test that confirms guaranteed first-iteration execution even when the condition is already true at entry.

{ Leaving early from a repeat loop }
repeat
    if i = 2 then
        leave;
    total := total + i;
    i := i + 1;
until i > 3;

leave exits the enclosing procedure, not just the loop. The test suite covers leave inside while, for, and repeat bodies.

The case Statement

program CaseEnumMatch;

imp
    WriteLn : std;

type
    Color = (Red, Green, Blue);

var
    color : Color;

begin
    color := Green;

    case color of
        Red:   WriteLn("red");
        Green: WriteLn("green");
        Blue:  WriteLn("blue");
    end;
end.

The case statement dispatches on any ordinal selector — integers, subranges, enumerations. Each arm is a constant label followed by a statement or begin...end block. An optional else branch handles unmatched values.

{ case with integer selector and else branch }
case selector of
    1: WriteLn("one");
    2: WriteLn("two")
else
    WriteLn("other")
end;

The 4.5 implementation covers single constant labels with optional else. The compiler enforces that labels must be ordinal, must be compile-time constants, must be compatible with the selector type, and must not repeat. The no-semicolon-before-else rule is enforced consistently across both if and case — tested by dedicated error cases in the harness.

Process Arguments

program ProcessArguments;

imp
    * : std;
    * : process;

begin
    if ProgramName() # Empty then
        WriteLn("program=present")
    else
        WriteLn("program=empty");

    WriteLn("count=%d", ArgCount());

    if Arg(1) = "alpha" then
        WriteLn("arg1=match");

    if Arg(2) = "Grüße" then
        WriteLn("arg2=match");
end.

The process library exposes program arguments through three clean functions: ProgramName(), ArgCount(), and Arg(n). The program declaration stays parameterless — no C-style argc/argv in the entry point — and all argument access goes through ordinary library calls. This is tested in both UTF-32 and UTF-8 execution modes, including Unicode argument values.

Records, Pointers, and Explicit Memory

type
    Point = record
        x : int32;
        y : int32;
    end;

var
    point : Point;
    ptr   : pointer Point;

begin
    point.x := 4;
    point.y := 9;
    ptr := address point;

    { Explicit form — always valid }
    value ptr.x := value ptr.x + point.y;

    { Auto-deref form — also valid for pointer-to-record field selection }
    ptr.x := ptr.x + point.y;

    WriteLn("x = %d", point.x);
    WriteLn("y = %d", point.y);
end.

This is where Mica’s memory model comes into focus:

  • pointer Point — a pointer type, declared in plain words
  • address point — takes the address of a variable (no & operator)
  • value ptr — dereferences a pointer to read or write the full pointed-to value (no * operator)
  • value ptr.x — explicit dereference-then-select, reading naturally left to right

Pragmatic auto-deref for pointer-to-record field selection. Mica is explicit about memory by principle — but not dogmatic about it. For pointer-to-record field selection chains specifically, writing value is optional. When the semantic analyzer encounters ptr.x and ptr is a pointer Point, it recognizes the pattern and the lowering pass automatically inserts a dereference node before code generation. The rewrite p (*struct) → dereference(p) produces identical machine code to the explicit value ptr.x form — there is no runtime difference, no extra indirection, no overhead of any kind.

This is a deliberate pragmatic choice. Requiring value on every field access through a pointer adds ceremony without adding clarity — the field selector already names the target unambiguously. The value keyword carries its weight where it matters: full pointer dereference to read or write the complete pointed-to value, passing a dereferenced value as an argument, or any case where the dereference is not immediately followed by a named field.

The rule is precise and local: auto-deref applies only to pointer-to-record identifier bases in selection chains. Everything else remains explicit. Arrays do not decay to pointers. Aggregate types carry no implicit reference semantics. The auto-deref is typed, deterministic, and lowered during semantic analysis — not a runtime behavior.

Sets, Subranges, and Membership

type
    N    = 0..3;
    NSet = set of N;

function Build(lo : N, hi : N) : NSet;
begin
    Build := [lo..hi];
end;

function Contains(v : N, s : NSet) : bool;
begin
    Contains := v in s;
end;

Sets are first-class values. They can be passed to and returned from functions, constructed with range expressions at runtime, and queried with in. The set constructor [lo..hi] generates real machine code for dynamic bounds.

Short-Circuit Evaluation

{ The right side is NEVER evaluated when left side is False for and,
  or True for or. The compiler generates conditional jumps, not eager calls. }

result := False and IncrementTrue();
{ counter stays 0 — right side was skipped }

result := True or IncrementFalse();
{ counter stays 0 — right side was skipped }

Checked Arithmetic

program CheckedArithmeticPolicyDebugRelease;

imp
    WriteLn : std;

var
    a, b, c : int64;

begin
    a := MaxInt64;
    b := 1;
    c := a + b;
    WriteLn("signed_add_result=%lld", c);
end.

This single program produces three different behaviors depending on the compiler’s optimization policy:

  • --optimize debug or release: hardware wrap semantics, overflow is silent
  • --optimize checked: compiler-inserted overflow detection terminates the program with a diagnostic including source file and line number

The harness verifies all three behaviors automatically via test variants — same source, different flags, independently validated expected output.

All Numeric Types

Mica provides the full set of fixed-width types that systems programmers expect:

CategoryTypes
Signed integersint8, int16, int32, int64
Unsigned integersuint8, uint16, uint32, uint64
Floating-pointfloat32, float64
Booleanbool
Characterunicode (32-bit code point)
Stringstring (UTF-32, descriptor-based)

Every type has a precise size, a precise ABI classification, and predictable behavior at every optimization level.


◈ The Ordinal Universe

One of the most powerful ideas in Mica’s type system is the concept of ordinal types and the rich type algebra that flows from them.

In most languages, “number” and “sequence” are the same thing. Arrays start at zero. Loops count upward. Enumerations are secretly integers. Sets are bit masks. The programmer’s domain model — months, colors, grades, directions — is expressed in raw integers and the meaning is carried only in the programmer’s head.

Mica offers a different way of thinking.

What Is an Ordinal?

An ordinal type is any type whose values form a finite, ordered, discrete sequence with a well-defined first element, last element, and successor function. Every value in an ordinal type can be counted, compared, and used as an index.

Mica’s ordinal types are:

TypeDomainExample Values
bool{False, True}2-element binary ordinal
int8int64Integer intervals-128..127, 0..2147483647
uint8uint64Unsigned intervals0..255, 0..18446744073709551615
unicodeUnicode code points'a', 'π', '漢'
EnumerationNamed constantsRed, Green, Blue
SubrangeRestricted interval1..31, 0..9

Ordinals — Domain Knowledge Carried by the Type System

This is not just a programming convenience. Mica’s type system carries ordinal domain knowledge natively — ordinals know their bounds, and the compiler uses that knowledge structurally throughout the language.

Every ordinal type — integers, booleans, unicode, enumerations, and subranges — has computable domain bounds: the compiler knows its exact lower and upper values. Dedicated type system functions resolve the [lower, upper] interval of any ordinal type, determine whether a constant value lies within that domain, find the common ordinal type for binary operations, and establish the underlying arithmetic base type for ordinal calculations. Whether one ordinal value can be assigned to another ordinal type is governed by explicit domain-aware rules, not silent integer coercions.

That domain knowledge drives a surprising amount of the language:

  • Array sizes are computed from ordinal index domains, not raw constants
  • Set cardinalities are computed from the ordinal domain of the element type
  • for loop bounds are validated against the ordinal domain of the control variable
  • case label validation checks that each label is within the ordinal domain of the selector
  • Subrange assignment is checked against the declared interval

The type system doesn’t just permit ordinals — it knows their domains and uses that knowledge everywhere those types appear. The rest of this section shows what that enables.

Enumerations — Naming the Domain

type
    Color     = (Red, Green, Blue, Yellow);
    Direction = (North, East, South, West);
    Month     = (Jan, Feb, Mar, Apr, May, Jun,
                 Jul, Aug, Sep, Oct, Nov, Dec);

An enumeration is not a shorthand for integer constants. It is a new type with its own named domain. Red is not 0. It is Red. The compiler knows the type is Color, knows the domain has four elements, and can reject nonsense like assigning a Direction to a Color variable.

The ordinal ordering of enum values follows their declaration order. Red < Green < Blue < Yellow holds. You can use enum values as array indices, set elements, and for-loop bounds — all of which become semantically meaningful rather than numerically arbitrary.

Subranges — Restricting the Domain

type
    Day    = 1..31;
    Month  = 1..12;
    Digit  = 0..9;
    Grade  = 1..5;
    Slot   = 1..3;

A subrange declares that a variable holds values only within a specified interval. Day is not int32. It is a type whose domain is exactly {1, 2, ..., 31}. The compiler knows the bounds. Arrays indexed by subranges have a known and exact size. Sets over subranges have a known and exact cardinality.

This changes what you can express in a type signature. A function that accepts a Day is telling the truth about what values it accepts. A function that accepts int32 and happens to only use values 1..31 is carrying its contract only in comments.

Arrays with Ordinal Index Types — and Lower Bounds That Mean Something

This is where the ordinal model becomes concretely powerful.

In most languages, arrays start at zero. Always. That is a hardware convention that leaked into language design: memory addresses start at zero, so array indices start at zero. The problem is that most real-world domains do not start at zero. Months start at 1. Floors in a building are numbered. A temperature sensor covers −20 to 60. A population table runs from 1900 to 2100. A convolution kernel is centered at zero and runs from −2 to 2.

Every time the domain doesn’t start at zero, the programmer carries an invisible translation burden: daysInMonth[month - 1], population[year - 1900], kernel[offset + 2]. That subtraction is not part of the problem. It is an artifact of the storage model leaking into the logic. Forget it once and you read the wrong data — silently.

Mica arrays have a lower bound that belongs to the type, not to the programmer’s memory. The index type is an ordinal type, and the array spans exactly the domain of that type:

type
    Month        = (Jan, Feb, Mar, Apr, May, Jun,
                    Jul, Aug, Sep, Oct, Nov, Dec);
    DaysInMonth  = array[Jan..Dec] of int32;

    { Years 1900 to 2100 — the lower bound is part of the type }
    PopulationTable = array[1900..2100] of int64;

    { A convolution kernel centered at zero — negative lower bound }
    Kernel = array[-2..2] of float64;

    { Workdays 1..5, not 0..4 }
    WeekdayHours = array[1..5] of int32;

DaysInMonth[Jan] is the number of days in January. Not daysInMonth[0]. Not daysInMonth[MONTH_JAN - 1]. The index is the month, because the array’s domain is the month domain.

population[1970] is the population in 1970. Not population[1970 - 1900].

kernel[-1] is the kernel value at offset −1. Not kernel[-1 + 2].

The compiler knows both bounds. It computes the array’s size from upper - lower + 1 — no programmer arithmetic required, no opportunity to get it wrong. Constant index accesses outside the declared range are a compile-time error.

var
    pop : PopulationTable;
    k   : Kernel;
begin
    pop[1900] := 1_600_000_000;
    pop[2000] := 6_100_000_000;

    k[-2] := 0.0625;
    k[-1] := 0.25;
    k[0]  := 0.375;
    k[1]  := 0.25;
    k[2]  := 0.0625;
end.

This code reads exactly like the problem it models. A reader who knows the domain immediately understands the code. A reader who doesn’t know the domain can read the type declaration and immediately understands the valid range. There is nothing to decode, no constant to subtract, no comment needed to explain why the index starts where it does.

With enum-typed indices the benefit compounds further:

type
    Color        = (Red, Green, Blue, Yellow);
    ColorWeights = array[Red..Yellow] of float64;

var
    weights : ColorWeights;
begin
    weights[Red]    := 0.2126;
    weights[Green]  := 0.7152;
    weights[Blue]   := 0.0722;
    weights[Yellow] := 0.0;
end.

There is no index 4 that silently writes past the end. There is no integer 0 standing in for Red. The array has exactly as many elements as the domain has values — computed by the compiler from the type, not declared by the programmer as a separate constant that can drift out of sync.

Multi-dimensional arrays compose naturally, mixing index types freely:

type
    Grid = array[1..3, Red..Blue] of int32;

var
    g : Grid;
begin
    g[1, Red]   := 10;
    g[2, Green] := 20;
    g[3, Blue]  := 30;
end.

Sets — Membership, Not Bit Manipulation

type
    Color    = (Red, Green, Blue, Yellow);
    ColorSet = set of Color;

var
    palette  : ColorSet;
    warm     : ColorSet;
    combined : ColorSet;

begin
    palette  := [Red, Green, Blue];
    warm     := [Red, Yellow];

    if Red in palette then
        WriteLn("Red is in the palette");

    if not (Yellow in palette) then
        WriteLn("Yellow is not in the palette");
end.

Sets in Mica are mathematical sets over an ordinal domain. They are not bit flags. They are not integers with & and |. They are sets. Membership is tested with in. Construction uses set literal syntax [...] with optional range expressions [lo..hi].

Compare this to typical C:

/* C: the programmer manually maintains the bit-shift contract */
#define RED    (1 << 0)
#define GREEN  (1 << 1)
#define BLUE   (1 << 2)
#define YELLOW (1 << 3)

int palette = RED | GREEN | BLUE;
if (palette & RED) { ... }   /* is this testing bit 0? who knows */

In Mica, the set is typed. The domain is known at compile time. The element type is Color, not int. Red in palette reads like the question it is asking.

Sets also combine with ranges in constructors:

type
    Digit    = 0..9;
    DigitSet = set of Digit;

var
    evens : DigitSet;
    odds  : DigitSet;

begin
    evens := [0, 2, 4, 6, 8];
    odds  := [1, 3, 5, 7, 9];
    odds  := [1..9];          { range constructor }
end.

For Loops Over Ordinal Domains

The for statement’s natural home is the ordinal type:

{ Enumerate all colors — no off-by-one possible }
for c := Red to Yellow do
begin
    schedule[c] := ComputeValue(c);
end;

{ Iterate over a subrange domain }
for day := 1 to 31 do
    WriteLn("Day %d", day);

{ Count down through an enum }
for d := Blue downto Red do
    WriteLn("color index = %d", d as int32);

The loop domain is the domain of the ordinal type. For an enum, the loop visits every named value in declaration order. For a subrange, it visits every integer in the interval. The control variable is read-only inside the loop body — the compiler enforces this, because a for loop means iteration over a domain, not arbitrary mutation of a counter.

The Semantic Expressiveness This Enables

When ordinal types, arrays with ordinal indices, and sets work together, the code says what it means — not just how it computes:

type
    WorkDay    = (Mon, Tue, Wed, Thu, Fri);
    WorkDaySet = set of WorkDay;
    HourTable  = array[Mon..Fri] of int32;

var
    meetings  : WorkDaySet;
    workHours : HourTable;
    total     : int32;

begin
    meetings := [Mon, Wed, Fri];
    workHours[Mon] := 8;
    workHours[Tue] := 9;
    workHours[Wed] := 6;
    workHours[Thu] := 9;
    workHours[Fri] := 7;

    total := 0;
    for day := Mon to Fri do
    begin
        if day in meetings then
            workHours[day] := workHours[day] - 1;
        total := total + workHours[day];
    end;
end.

No magic integers. No enum-to-int casts for array indexing. No bit-mask arithmetic for set membership. The code reads like the domain model it describes. A compiler — or a reader — can verify it locally without knowing any context outside the type declarations.

Nested Ordinal Types — Records Containing Sets Containing Ordinals

The real power emerges when these concepts compose:

type
    Color      = (Red, Green, Blue, Yellow);
    Digit      = 0..9;
    Slot       = 1..3;

    DigitSet   = set of Digit;
    ColorSet   = set of Color;
    ScoreArray = array[1..3] of int32;
    ColorArray = array[Red..Yellow] of int32;

    Item = record
        id     : Digit;
        flags  : DigitSet;
        scores : ScoreArray;
    end;

    Inventory = record
        rows    : array[1..3] of Item;
        palette : ColorArray;
        enabled : ColorSet;
    end;

This data structure is fully verified by the compiler:

  • Item.id must be in 0..9 — the type enforces it
  • Item.flags is a set over 0..9 — membership uses in, not bit shifts
  • Item.scores has exactly 3 elements, indexed 1 to 3 — not 0 to 2
  • Inventory.palette is indexed by Color values, not by integers 0–3
  • Inventory.enabled is a set of ColorRed in inv.enabled just works

And all of this is passed by value, returned by value, nested at arbitrary depth — all with real machine code, DWARF annotations, and verified by the test suite.

Looking Forward — Ordinals in the AI Trajectory

The ordinal type system is infrastructure for Mica’s long-range direction.

When Mica eventually adds fixed-shape vector and matrix types, the shape dimensions will naturally be expressed as ordinal types or subranges. An array indexed by 0..3 and an array indexed by AxisX..AxisW are the same machine layout — but the latter carries meaning that enables compile-time shape checking:

{ Future Mica syntax — not implemented yet }
type
    Axis   = (X, Y, Z, W);
    Vec4   = vector[Axis] of float32;
    Mat4x4 = matrix[Axis, Axis] of float32;

The connection from today’s array[Red..Blue] of int32 to tomorrow’s matrix[Axis, Axis] of float32 is direct. The type system already knows how to represent, index, and validate ordinal-indexed aggregates. The AI trajectory builds on the same foundation, not a different one.

Sets over ordinal types become natural precursors to predicate types and domain restrictions in a type system that reasons about data shapes. Subranges become natural representations for index bounds in tensor operations. The for loop over an ordinal domain becomes the natural surface for shape-driven iteration over tensor dimensions.

Mica is not retrofitting these ideas into a language that was built around raw integers. It is growing them out of a foundation that was designed for them.


◈ A Language for the Platform — Not Above It

The goal is not to replace the platform. The goal is to meet it.

Operating systems, native libraries, and hardware interfaces represent decades of accumulated engineering. Linux, glibc, POSIX, BLAS, OpenSSL, SDL, Vulkan — these exist, they are stable, they are fast, and they are documented. The problem with many modern language ecosystems is that they build a second world on top of this first one: a second memory manager, a second I/O runtime, a second type system for FFI, a second dependency graph, a second security model. The developer ends up living in the second world and treating the first as a distant, hostile country they occasionally have to visit.

Mica makes the opposite bet. Every C library on the system is a Mica library, described by a JSON contract and callable directly — no wrappers, no glue layers, no marshaling overhead. The system allocator is the allocator. The ELF loader is the loader. The OS scheduler is the scheduler. GDB is the debugger, because Mica binaries carry accurate DWARF v5 debug information.

WriteLn and ReadLn are not new I/O systems

This is the vision made concrete. When a Mica developer writes:

WriteLn("Balance = %d EUR", balance);
ReadLn("%d", n);

there is no Mica I/O runtime underneath. The compiler resolves WriteLn to wprintf and ReadLn to wscanf — the same wide-character standard C library functions that every C program on the system has always used. On a UTF-8 platform the contract resolver selects printf and scanf instead. The developer writes one call. The compiler selects the right platform function based on the target encoding. No new I/O ecosystem. No new runtime. No new abstraction to learn, document, or maintain.

This is not a special case. It is the design rule. When you import * : math;, you are importing the same libm that every C program on your system uses. When you link a Mica library as a .so, it is a regular shared object that any C program can dlopen. The binary is a first-class citizen of the platform from the first line of compiled code.

Consequences that matter

  • The entire ecosystem is already available. Every C library reachable from the system is reachable from Mica, through a contract file, with compile-time type safety and format-string validation — today, not after an ecosystem matures.
  • Native tooling works without configuration. Profilers, debuggers, address sanitizers, linker scripts, and ELF inspection tools all work because the binary is a normal ELF binary with normal DWARF information.
  • Mica binaries and C binaries are equal citizens. They link together, call each other, and share the same ABI with no performance tax at the boundary.
  • There is no bootstrapping problem. There is no “Mica ecosystem” that must reach critical mass before useful programs can be written.

The contract system

A JSON contract describes a C library’s type surface, calling conventions, and ABI layout precisely enough that the Mica compiler can validate calls at compile time — including format strings, argument types, and cross-unit type fingerprints. The developer gets compile-time safety without runtime overhead and without writing a single line of binding code.

As the Mica standard library grows, every piece of it is designed to delegate to the native platform for what the platform already does well. Mica’s UTF-32 string model sits on top of wchar_t. The math library is libm with a clean contract. The planned posix and linux contract packs will be curated access layers over real POSIX syscalls and Linux kernel APIs — not reimplementations.

Mica augments the platform. It does not compete with it.


◈ The Compilation Pipeline

Mica implements a textbook-clean compilation pipeline with explicit phase boundaries. Every intermediate representation can be inspected, exported, and reasoned about independently.

┌───────────────┐
│  Source Code  │   UTF-8 encoded .mica files
└───────┬───────┘
        ▼
┌───────────────┐
│   Scanner     │   Lexical analysis → Token stream
└───────┬───────┘
        ▼
┌───────────────┐
│   Parser      │   Recursive descent → Abstract Syntax Tree
└───────┬───────┘
        ▼
┌───────────────┐
│   Analyzer    │   8 semantic passes → Enriched, validated AST
└───────┬───────┘
        ▼
┌───────────────┐
│   Generator   │   AST traversal → Spectra IL (three-address code)
└───────┬───────┘
        ▼
┌───────────────┐
│   Emitter     │   IL → x86_64 assembly instructions
└───────┬───────┘
        ▼
┌───────────────┐
│   Optimizer   │   17 peephole passes → Cleaned assembly
└───────┬───────┘
        ▼
┌───────────────┐
│   ELF/DWARF   │   Binary encoding + DWARF v5 debug information
└───────┬───────┘
        ▼
┌───────────────┐
│  GNU Binutils │   as + ld → Executable, static library, or shared object
└───────────────┘

◈ Phase 1 — The Scanner

Package: scanner/ · ~10 files

The scanner converts UTF-8 source bytes into a stream of tokens. It sets the tone for Mica’s philosophy: careful, position-tracked, and fully Unicode-aware.

Token Categories

  • Literals: Integer, FloatingPoint, String, Unicode character
  • Operators: +, -, *, /, =, #, <, <=, >, >=
  • Symbols: (, ), [, ], ,, :, ;, ., .., :=
  • Keywords: Over 45 reserved words, including (as of 4.5): program, library, function, procedure, record, array, set, pointer, address, value, if, then, else, while, do, for, to, downto, repeat, until, case, of, begin, end, leave, and, or, not, mod, odd, as, in, imp, const, var, type, packed

Numeric Literals

The scanner supports decimal, hexadecimal (0x), and binary (0b) integer literals with optional digit separators. Floating-point literals support scientific notation (1.0e20, 3.4028235e+38).

Comments

{ Block comments can span
  multiple lines }

// Line comments run to end of line
x := 5;  // Inline comments too

Position Tracking

Every token carries its source line and column. This propagates through the entire pipeline — from scanner through DWARF debug information — so error messages, runtime failures, and debugger breakpoints always point to exact source locations.


◈ Phase 2 — The Parser

Package: parser/ · ~32 files

The parser is a classic recursive descent implementation that transforms the token stream into an Abstract Syntax Tree. It handles the full Mica grammar: declarations, type expressions, all statement forms, and expressions with correct operator precedence.

Program Structure

Every Mica source file begins with either program (for executables) or library (for reusable modules), followed by declaration sections in order:

program/library → imp → const → type → var → procedures/functions → begin...end.

Operator Precedence

LevelOperatorsCategory
1 (highest)not, -, +, oddUnary
2*, /, modMultiplicative
3+, -Additive
4=, #, <, <=, >, >=, inComparison / Membership
5andLogical AND (short-circuit)
6 (lowest)orLogical OR (short-circuit)

Statements (Complete 4.5 Surface)

StatementFormNotes
Assignmentx := exprIncluding value ptr.field := expr
CallProc(args)Procedure or discarded function call
Compoundbegin ... endSequence of statements
Ifif cond then stmt [else stmt]No semicolon before else
Whilewhile cond do stmtPre-test loop
Forfor v := init to/downto final do stmtControl variable read-only
Repeatrepeat stmts until condPost-test, always executes once
Casecase sel of labels... [else stmt] endOrdinal selector only
LeaveleaveExits current procedure

Type Expressions

FormSyntaxExample
Recordrecord fields endPoint = record x, y : int32; end
Packed recordpacked record fields endNo internal padding
Arrayarray[bounds] of Tarray[1..10] of float64
Multi-dim arrayarray[b1, b2] of Tarray[1..2, Red..Blue] of int32
Setset of OrdinalTypeset of Digit
SubrangeLow..High1..31
Enum(Name1, Name2, ...)(Red, Green, Blue)
Pointerpointer BaseTypepointer Point
Filefile of Tfile of int32

Nesting Depth

Functions and procedures can be nested up to 16 levels deep. The parser tracks nesting depth and propagates it through the AST for static-link generation.

Error Recovery

The parser implements token-level synchronization to continue after syntax errors. A single compilation can report multiple diagnostics rather than stopping at the first problem.


◈ Phase 3 — The Abstract Syntax Tree

Package: ast/ · 67 files (the largest single package)

The AST is the central data structure of the compiler. It represents every syntactic construct as a typed node in a tree, and it serves as the shared currency between parsing, analysis, code generation, and export.

Node Taxonomy

The AST defines 26+ node kinds organized into five categories:

Declarations — the things a program defines: Import, Constant, Parameter, Variable, Signature, Function, DataType, RecordField

Expressions — the things a program computes: Arithmetic, UnaryArithmetic, Comparison, Logical, UnaryLogical, Memory (address-of, dereference), Conversion (casts), Selection (field / index)

Type Expressions — the shapes of data: Record, Array, Set, Subrange, Enum, File, SetConstructor

Statements — the things a program does: Assignment, Call, Leave, If, While, For, Repeat, Case, Compound

Uses — references to declared things: IdentifierUse, LiteralUse

The For, Repeat, and Case statement node kinds each carry their semantics as typed fields: direction for For, condition for Repeat, selector and arm list for Case.

The Visitor Pattern

The AST is traversed using the visitor pattern with double dispatch. This separates the structure of the tree (defined once in ast/) from the operations on it (defined in analyzer/, generator/, and other packages).

Traversal orders supported: PreOrder, InOrder, PostOrder, LevelOrder.

The Global Registry

For multi-file compilation, the AST package maintains a global registry that coordinates declarations across compilation units. Imports are validated through type fingerprinting — a deterministic digest of the type’s structure that detects ABI mismatches before linking.

Symbol Annotations

Each declaration node carries metadata that enriches through the pipeline:

  • FlatName — internal name for intermediate code (e.g., f2.1)
  • GlobalName — external symbol name for library exports
  • Value — compile-time constant value (when applicable)
  • PassingMode — how the parameter travels at the ABI level
  • NestingDepth — scope level for static-link generation

◈ Phase 4 — The Type System

Package: typesystem/ · 37 files

The type system is the backbone of Mica’s static guarantees. It describes every data type, tracks its properties, and determines what operations are legal.

Type Kinds

KindExamplesNotes
Signed integersint8, int16, int32, int641/2/4/8 bytes
Unsigned integersuint8, uint16, uint32, uint641/2/4/8 bytes
Floating-pointfloat32, float64IEEE 754
Booleanbool1 byte
Unicodeunicode4 bytes, 32-bit code point
StringstringDescriptor: pointer + length, UTF-32
Recorduser-definedOrdinary or packed layout
Arrayuser-definedFixed-size, multi-dimensional, custom index bounds
Setuser-definedOver any ordinal domain
Subrangeuser-definedRestricted ordinal interval
Enumerationuser-definedNamed ordinal constants
Pointerpointer TExplicit pointer to any type
Filefile of TTyped file I/O
Function / Procedurecallable typesParameter types, return type

Type Capabilities

Rather than a hard-coded switch statement, Mica uses a capability bit-mask system. Each type advertises what it can participate in:

CapabilityMeaning
NumericSupports +, -, *, /
OrderedSupports <, <=, >, >=
EqualitySupports =, #
LogicalSupports and, or, not
IntegralInteger-specific operations
FractionalFloat-specific operations
DereferenceableCan use value to dereference
AddressableCan use address to take address
ConvertibleCan use as to cast
NegatableSupports unary -
CallableCan be called as a function
SelectableSupports .field access
IndexableSupports [index] access
OrdinalHas a discrete ordered domain (required for for, case, set)

This design keeps the analyzer clean: “is this type valid as a for-loop control variable?” is a single capability query, not a sprawling type switch.

ABI Awareness

The type system is ABI-aware from the ground up. Every type knows:

  • Its size in bytes on the target platform
  • Its alignment requirement
  • How it is passed to functions (integer register, SSE register, hidden pointer)
  • How it is returned from functions
  • Its System V AMD64 classification: INTEGER, SSE, or MEMORY

This means function signatures tell the truth. Adding a field to a record does not silently change how it is passed. An array does not decay to a pointer. The performance model is visible in the source.


◈ Phase 5 — Semantic Analysis

Package: analyzer/ · 22 files · Package: evaluation/ · 20 files

The analyzer validates that a syntactically correct program is also semantically correct — names resolve, types match, constants fold, and every operation and statement form obeys its rules.

Eight Coordinated Passes

Pass 1 ─── StaticShallow ─────────── Register top-level declarations
Pass 2 ─── StaticImportAttach ────── Resolve imports across files
Pass 3 ─── StaticResolveDeferred ─── Resolve forward references
Pass 4 ─── StaticDeep ───────────── Full semantic validation
Pass 5 ─── Lowering ─────────────── Normalize expression forms
Pass 6 ─── TypeCoercion ─────────── Insert implicit conversions
Pass 7 ─── ConstantFolding ──────── Evaluate compile-time constants
Pass 8 ─── StaticWarn ───────────── Report unused declarations

Pass 1 — StaticShallow registers all top-level names without analyzing bodies, allowing later passes to resolve references regardless of declaration order.

Pass 2 — StaticImportAttach connects imported declarations from other compilation units, matching them to their exported symbols.

Pass 3 — StaticResolveDeferred handles forward-reference dependencies, such as mutually referential types.

Pass 4 — StaticDeep is the main validation pass. It resolves every identifier use to its declaration, checks every operation against its operand types, validates call argument counts and types, checks array index types, validates set membership, detects uninitialized pointers, enforces for-loop control-variable rules (read-only in body, correct ordinal domain, no address-of, no outer-scope reuse), validates case labels (ordinal, constant, compatible with selector, no duplicates), and checks until conditions are boolean. This pass produces over 100 distinct error codes with precise diagnostic messages.

Pass 5 — Lowering normalizes expressions into canonical forms the code generator expects.

Pass 6 — TypeCoercion inserts explicit conversion nodes where implicit widening is allowed, making coercions visible and explicit in the tree.

Pass 7 — ConstantFolding evaluates constant expressions at compile time. 5 + 7 * 2 becomes the integer 19 before any code is generated. The evaluation/ package implements the full set of compile-time arithmetic, comparison, logical, and cast operations — including detecting division by zero, infinity, and NaN in constant expressions.

Pass 8 — StaticWarn detects unused variables, parameters, and imports.

Quality Gates

The analyzer includes a quality gate system — internal consistency checks that verify the compiler’s own transformations are correct. These are compiler self-tests that run during development, not user-facing guards.

For-Statement Analysis

The analyzer enforces the full for contract in Pass 4:

  • Control variable must be a local variable (not a parameter, constant, return variable, or outer-scope variable)
  • Control variable type must be integral, enum, or subrange (not boolean, set, record, or array)
  • Inside the loop body, the control variable is marked active and any assignment or address-of operation is rejected
  • Nested for loops cannot reuse an active outer control variable

Case-Statement Analysis

  • Selector must have an ordinal type
  • Each label must be a compile-time constant of an ordinal type compatible with the selector
  • Duplicate labels are rejected
  • No semicolon is permitted immediately before else

◈ Phase 6 — Spectra: The Intermediate Language

Package: intermediate/ · 14 files · Package: generator/ · 16 files

Between the high-level AST and the low-level x86_64 assembly sits Spectra — Mica’s three-address code intermediate language. Spectra is where the program stops being a tree and becomes a flat sequence of operations that map almost directly to machine instructions.

What Three-Address Code Looks Like

Each Spectra instruction is a quadruple: an operation with up to three addresses (two inputs and one output). Here is the IL for a simple loop:

    v1.1:int32 = literal 0:int32        { i := 0 }
    store v1.1:int32, i
    m1.2:int32 = literal 3:int32        { final bound, evaluated once }
    store m1.2:int32, _final

.loop_test:
    m1.3:int32 = load i
    m1.4:int32 = load _final
    jumpGreater m1.3:int32, m1.4:int32, .loop_exit

    { loop body here }

    m1.5:int32 = load i
    m1.6:int32 = literal 1:int32
    m1.7:int32 = add m1.5:int32, m1.6:int32
    store m1.7:int32, i
    jump .loop_test

.loop_exit:

Every value has a name (like m1.3), a type annotation (like :int32), and a clear origin. The m prefix denotes a temporary, v denotes a variable, p denotes a parameter.

The Instruction Set

Spectra defines 37 operations across clean categories:

Arithmeticadd, subtract, multiply, divide, modulo Unarynegate, odd Comparisonequal, notEqual, less, lessEqual, greater, greaterEqual Logicaland, or, not Conversioncast Memoryliteral, load, store StructureloadField, storeField ArrayloadElement, storeElement SetclearSet, includeSetValue, includeSetRange, containsSetValue Control flowjump, jumpEqual, jumpNotEqual, jumpLess, jumpLessEqual, jumpGreater, jumpGreaterEqual, branchTarget Functionsargument, call, loadReturn, storeReturn, prologue, epilogue

Address Design

A key architectural decision is the separation of identity from storage. An address name like m2.3 is a lookup key into the symbol table — not a memory location. The actual storage (which register, which stack slot) is decided later during activation record layout and emission.

Addresses carry memory modifiers:

  • Value (default) — direct access to the value
  • Address — address-of
  • Indirect — dereference through a pointer

And projections for structured access:

    m1.1:int32 = loadField v1.1.point.x:int32      { record field }
    m1.2:int64 = loadElement v2.1[]:int64, idx      { array element }

For-Loop IL

The generator produces a specific pattern for for loops that reflects the guarantee of exactly-once bound evaluation and exit-before-step terminal safety:

    { Evaluate and save both bounds before the loop }
    m_init  = <initial expression>
    m_final = <final expression>       { evaluated exactly once, stored }
    store m_init, control_var

.test:
    m_cur = load control_var
    { ascending: jumpGreater cur, final, .exit }
    { descending: jumpLess   cur, final, .exit }

    { body }

    { Step — only if we haven't hit the bound yet }
    { ascending: check overflow before increment }
    m_next = add/subtract m_cur, 1
    store m_next, control_var
    jump .test

.exit:

The IL tests ILForLoopTo and ILForLoopDownto verify this pattern with CHECK directives against the exact generated intermediate representation.

The Expression Stack — Compile-Time Slot Management

One of the more interesting internal mechanisms in the code generator is the expression stack — and the important thing to understand about it is that it exists entirely at compile time. It leaves no trace in the generated code.

When the generator traverses an expression tree — say a + b * c — it produces a sequence of Spectra IL instructions, each yielding a temporary result. In a naive implementation, those temporaries might be pushed and popped on the CPU stack at runtime. Mica takes a fundamentally different approach: every temporary result is assigned a pre-calculated, fixed slot in the activation record during code generation, before any assembly is emitted.

How it works:

When a sub-expression produces a result, the generator finds the lowest-numbered slot that is not currently occupied by any live temporary — across both the expression stack and any held scopes — and assigns that name to the result (for example m1.1, m1.2, m1.3). The temporary is recorded as live.

When a consumer needs that result, the temporary is popped from the compile-time stack in LIFO order and its slot number becomes available for reuse by the next temporary.

Slot reuse is what keeps activation records compact. If the temporary in slot m1.1 is consumed before the next temporary is created, the next temporary reuses slot 1. At any point, the number of occupied slots equals the maximum depth of simultaneously live temporaries — not the total number of temporaries ever produced.

For expressions that cannot be consumed in strict LIFO order — short-circuit logical operators, set constructors, call argument sequences — the generator opens a held scope. While a held scope is open, its temporaries remain marked as live and their slots are protected from reuse even after being consumed by the generating code. When the scope closes, the held temporaries return to the caller for further use.

The outcome: by the time the activation record layout is calculated, every temporary already has a deterministic name pointing to a fixed offset below RBP. The emitter generates mov instructions to load and store values at those offsets. There are no runtime push/pop sequences for expression evaluation. The CPU stack pointer moves exactly once per function call — in the prologue — and never again.

This is a deliberate architectural choice about where complexity belongs. The expression stack ensures that complexity lives in the compiler, not in the generated code. The output is simple, predictable, and debugger-friendly.

Activation Record Layout

Before any assembly is emitted, the compiler calculates a complete activation record layout for every function:

                      CALLER'S FRAME
    ┌──────────────────────────────────────────┐
    │  [rbp+16]  7th argument (stack-passed)   │
    │  [rbp+8]   Return address                │
    │  [rbp+0]   Saved RBP                     │ ◄── RBP
    ├──────────────────────────────────────────┤
    │  [rbp-8]   Static link                   │
    │  [rbp-16]  Param 1 home (rdi copy)       │
    │  [rbp-24]  Param 2 home (rsi copy)       │
    │  ...                                     │
    │  [rbp-N]   Local variable 1              │
    │  [rbp-N-8] Local variable 2              │
    │  ...                                     │
    │  [rbp-M]   Temporary storage section     │
    │  ...                                     │ ◄── RSP (16-byte aligned)
    └──────────────────────────────────────────┘
                      CALLEE'S FRAME

Every offset is fixed at compile time. Same-named temporaries share a single stack slot sized to the largest type that occupies it. All section boundaries are 16-byte aligned per the ABI.

Nested Function Access

For nested functions accessing enclosing-scope variables, Spectra tracks the scope depth of each access. A variable reference v3.2^1 means “variable v3.2 from one scope level up.” The emitter translates this into a chain of static-link dereferences at runtime.


◈ Phase 7 — The Emitter

Package: emitter/ · 58 files (including sub-packages)

The emitter is where Spectra IL becomes real x86_64 machine instructions. It is the largest subsystem in the compiler, because x86_64 is an intricate architecture with intricate calling conventions.

Instruction Selection

The emitter translates each IL operation into one or more x86_64 instructions. This is not a simple 1:1 mapping: integer addition, unsigned addition, and floating-point addition generate different instruction sequences. Signed division requires cqo + idiv. Comparisons produce cmp followed by setCC. Function calls must marshal arguments into the ABI-specified registers in the correct order. The case statement generates a sequence of comparisons and conditional jumps.

Supported x86_64 Instructions

Base x86_64: mov, lea, push, pop, add, sub, imul, idiv, div, neg, and, or, xor, shr, shl, sar, cmp, test, jmp, je, jne, jl, jle, jg, jge, jb, jbe, ja, jae, jo, jno, sete, setne, setl, setle, setg, setge, call, ret, nop, cqo, ud2, and more

SSE2 (floating-point): movsd, movss, addsd, addss, subsd, subss, mulsd, mulss, divsd, divss, ucomisd, ucomiss, cvtsi2sd, cvtsi2ss, cvttsd2si, cvttss2si, xorpd, xorps, and more

Register Allocation

The emitter uses all 16 general-purpose registers and all 16 SSE registers, following the System V AMD64 calling convention:

  • Integer arguments: rdi, rsi, rdx, rcx, r8, r9
  • Float arguments: xmm0xmm7
  • Return values: rax/rdx (integers) or xmm0/xmm1 (floats)
  • Caller-saved: rax, rcx, rdx, rsi, rdi, r8r11
  • Callee-saved: rbx, r12r15, rbp

Checked Arithmetic

When --optimize checked is selected, the emitter inserts overflow detection after every signed arithmetic operation. For addition, this means a jo (jump-on-overflow) instruction branching to a runtime failure handler. The handler prints a diagnostic with source file, line number, and operation description — then terminates.

This is compiler-generated code, not a library flag. Every arithmetic site in the program gets the check woven in at the instruction level.

Assembly Syntax

The emitter supports both Intel and AT&T syntax from the same internal representation:

; Intel syntax (--assembly intel)
mov     rax, [rbp-8]
add     rax, rcx

; AT&T syntax (--assembly att)
movq    -8(%rbp), %rax
addq    %rcx, %rax

◈ Phase 8 — The Peephole Optimizer

Package: emitter/optimizer/ · 20 files

After the emitter produces assembly, the optimizer runs 17 peephole passes that clean up inefficiencies without changing program semantics. These are local, conservative transformations — each one provably safe and independently tested.

For a source-level walkthrough of this stage, read Peephole Optimization in the Mica Compiler. This chapter keeps the pipeline view; the dedicated optimizer article covers the pass groups, rewrite patterns, and safety rules in more detail.

PassWhat it eliminates
Adjacent store/loadstore → load pairs on the same location
Redundant loadLoad where the value is already in a register
Load/store forwardingLoad replaced by a previously stored value
Dead storeStore overwritten before being read
Copy propagationUnnecessary register-to-register moves
Literal propagationRegister references replaced by immediate constants
Arithmetic temp foldingTemporaries folded into arithmetic operands
Boolean temp foldingTemporaries folded into conditional set instructions
Compare temp foldingTemporaries folded into comparison operands
Compare cleanupRedundant comparison instructions
Push/pop eliminationBalanced push/pop pairs
Stack adjustment foldingAdjacent sub rsp/add rsp combinations
Call argument forwardingIntermediate moves in argument setup
Call return forwardingIntermediate moves for return values
SSE argument forwardingFloating-point argument passing
String descriptor forwardingString parameter passing
Windowed store/loadBroader-window store/load pattern matching

The optimizer tracks statistics — instructions before and after each pass, count of each pattern matched — so improvement is measurable and verifiable.


◈ Phase 9 — ELF and DWARF

Package: emitter/elf/ · ~6 files · Package: emitter/x86_64/ · ~13 files

ELF Binary Generation

The compiler produces standard ELF binaries with correct section layout:

  • .text — executable code
  • .data — initialized data
  • .rodata — read-only data (string literals, float constants)
  • .bss — zero-initialized data

Output artifacts:

  • Executables — standalone programs
  • Static libraries (.a archives) — for linking into other programs
  • Shared objects (.so) — dynamically loadable libraries

DWARF v5 Debug Information

Most hobby compilers stop at binary output and omit debug information entirely. Mica generates full DWARF v5 debug information — the current standard — across four dedicated ELF sections:

  • .debug_info — type descriptions, variable locations, function boundaries
  • .debug_abbrev — abbreviation tables for compact encoding
  • .debug_str — name string table (deduplicated identifier strings)
  • .debug_line — source line mapping (statement→address table)

A Language in the Registry

Each compilation unit’s DIE carries a custom language identifier, DW_LANG_Mica, registered in the DWARF user-defined range. GDB recognises the source language as Mica — not C, not Pascal, not an unknown value — because the compiler emits its own identity in every binary it produces.

Type System Fidelity

The DWARF generator traverses the full Mica type graph recursively, deduplicating entries by name and emitting a dedicated DIE for every type kind the language supports:

Mica typeDWARF tag
Integer, Float, Boolean, CharDW_TAG_base_type with encoding (signed, float, boolean, UTF)
PointerDW_TAG_pointer_type → value type DIE
RecordDW_TAG_structure_type with DW_TAG_member children
ArrayDW_TAG_array_type with DW_TAG_subrange_type children
SetDW_TAG_set_type with element type reference
EnumDW_TAG_enumeration_type with DW_TAG_enumerator children
SubrangeDW_TAG_subrange_type with base type and closed bounds
FileDW_TAG_base_type (opaque handle, no encoding)

Mica arrays carry their declared lower and upper bounds directly in the DW_TAG_subrange_type children via DW_AT_lower_bound and DW_AT_upper_bound, encoded in SLEB128. An array declared array[1900..2100] appears in GDB with exactly those bounds — not zero-based. The debugger sees the same domain the programmer declared.

Enum DIEs include every named literal as a DW_TAG_enumerator child with its SLEB128 ordinal value, so GDB prints Red instead of 0 when you inspect an enum variable.

Variable Location Expressions

Every variable, parameter, and constant DIE carries a DW_AT_location expression that describes where the value lives at runtime. The expression uses DW_OP_fbreg with a SLEB128-encoded CFA-relative offset. The CFA (Canonical Frame Address) is defined as RSP + 16 at function entry — the standard System V AMD64 definition — so stack unwinding and variable location work together correctly.

Because all activation record offsets are fixed at compile time (see § Expression Stack), every location expression is a compile-time constant. There are no dynamic location expressions, no location lists, no live-range splits. The DWARF is simple, correct, and consistent with the rest of the compiler’s design.

Source Coordinates

Every DIE for a type declaration, record field, variable, parameter, or function carries DW_AT_decl_file, DW_AT_decl_line, and DW_AT_decl_column. These are derived from the token stream index preserved through every compilation phase. A breakpoint set in GDB on a source line lands on the correct instruction because the line table was built from the same source positions the scanner assigned at Phase 1.

Subprogram Descriptors

Functions are emitted as one of three DW_TAG_subprogram variants: procedure (no return type), function (with DW_AT_type), or entry point (the main equivalent). Each carries DW_AT_frame_base = DW_OP_call_frame_cfa so GDB can unwind the call stack correctly. The function-name return variable — the mechanism Mica uses for function results — is emitted as an artificial variable (DW_AT_artificial = 1), clearly distinguished from user-declared locals.

In Practice

Load a Mica binary in GDB. Set a breakpoint by source line. Step through statements. Inspect variables by their Mica names — enum values print as names, array indices display with their declared bounds, record fields are accessible by name. Step into nested functions and watch the static-link chain resolve. Everything works correctly because the DWARF information describes what the compiler actually generated — no approximations, no placeholder encodings.


◈ The Compiler Driver

Package: compiler/ · ~11 files

The driver orchestrates the entire pipeline: parses CLI options, manages compilation phases, handles multi-file builds, and invokes the GNU toolchain.

CLI Interface

scripts/bin/mica \
    --compile --link \
    --source program.mica \
    --build build/output \
    --optimize debug \
    --assembly intel \
    --stdlib scripts/bin/mica-stdlib.a
FlagPurpose
--compile / -cRun compilation phases
--link / -lLink into final binary
--source / -sInput source files (comma-separated)
--build / -bBuild output directory
--optimize / -oPolicy: debug, release, checked
--assembly / -asmSyntax: intel or att
--stdlib / -slStandard library archive
--platform / -pfTarget platform components
--external / -extLink profile: static or shared
--export / -expExport intermediate representations
--no-pie / -npDisable position-independent executable

Phased Concurrency

The compiler driver uses a phased parallel execution model. Within each phase, all compilation units run concurrently. Between phases, the driver inserts a barrier — ensuring, for example, that all files complete parsing before semantic analysis begins, because cross-file import resolution requires all declarations to be registered first.

The phases and their execution mode in 4.5:

PhaseExecutionNotes
parseParallelEach source file is parsed independently
analyze — StaticShallowParallelEach unit analyzed independently in this pass
analyze — StaticImportAttach / Deep / WarnSequentialRequire shared global registry state
publishSequentialMutates shared global registry
generateSequentialRequires stable shared registry
controlFlowParallelCFG built per unit independently
emitParallelAssembly generated per unit independently
persistAssemblyCodeParallelWrite .s files concurrently
exportParallelExport IL/AST/binary artifacts concurrently

Persist and export complete the end-of-pipeline parallel picture. Writing assembly files to disk and exporting intermediate representations are independent per-compilation-unit operations with no reason to run sequentially.

Lock-free by design. The parallel() helper at the core of this system spawns one goroutine per compilation unit and waits for all of them with a single sync.WaitGroup. It needs no mutex, no channel, no lock for data access, because each goroutine writes exclusively to its own pre-allocated index slot in a result slice. Reading from shared immutable data (the global registry, type registry, interop contracts) is safe without locks because those structures are read-only during parallel execution. Worker panics are captured, re-thrown in deterministic input order on the caller goroutine after all workers complete.

The result is a pipeline that scales linearly with the number of source files in the parallel phases, and has no synchronization overhead beyond the WaitGroup itself.

Multi-File Compilation

scripts/bin/mica --compile --link \
    --source main.mica,utilities.mica \
    --build build/app \
    --stdlib scripts/bin/mica-stdlib.a

Each file is a separate compilation unit with its own namespace. Files communicate through imp declarations and contract-based resolution.


◈ C Interoperability and Contracts

Package: interop/ · ~7 files

Mica calls C libraries and can be called from C. This interoperability is built on a disciplined JSON contract system, not ad-hoc FFI.

JSON-Based Library Contracts

Every external library is described by a JSON contract that specifies its types, functions, calling conventions, and ABI classification. The compiler ships with embedded contracts for:

ContractContents
stdMica standard I/O: WriteLn, ReadLn, Write, Empty
processProgram arguments: ProgramName, ArgCount, Arg
limitsNumeric bounds: MaxInt8, MinInt32, MaxFloat64, etc.
cstdC standard library functions
mathMathematical functions: Sin, Cos, Sqrt, Pow, etc.

The limits Contract

Instead of injecting numeric bounds as hardcoded constants from Go code, Mica 4.5 exposes them through the limits embedded contract:

imp
    * : limits;

var
    x : int64;

begin
    x := MaxInt64;
    WriteLn("%lld", x);
end.

This makes the constant surface consistent and data-driven from the same JSON contract model used for all other library imports.

Type Fingerprinting

When two compilation units share a type definition, Mica computes a deterministic fingerprint (digest) for the type. Fingerprint mismatches — caused by field changes, type layout differences, or packing differences — are caught at compile time with a clear error, not at runtime with a crash.

Format String Analysis

When you call WriteLn with format specifiers like %d or %lld, the compiler validates the format string against the actual argument types at compile time. A mismatch is a compiler error, not a runtime surprise.

Real-World Interop Examples

  • MicaCallsC — Mica calling C printf, sin, cos, pow
  • CCallsMica — C code calling Mica-compiled functions
  • MicaCallsMica — Multi-file Mica projects with cross-unit imports
  • MicaCallsLinux — Direct Linux system calls via contracts

◈ The Standard Library

Mica’s standard library is a C23 implementation in interop/standard/mica-stdlib.c. It provides:

I/O functionsWriteLn, ReadLn, Write: delegate to the appropriate UTF-32 or UTF-8 C library calls transparently based on the target encoding.

Process functionsProgramName, ArgCount, Arg: access program arguments without exposing C-style argc/argv in the language entry point.

String runtime — UTF-32 string handling and descriptor-based string management (pointer + length + capacity).

Runtime support — Static link creation and traversal for nested functions, runtime failure reporting with source context for checked arithmetic.

The library compiles to mica-stdlib.a and is linked into every Mica binary.

Automatic UTF-8 / UTF-32 Argument Passing

One of the subtler things Mica does for you — and one that developer-facing languages frequently get wrong — is handle the string encoding boundary between the language and the platform completely automatically.

The platform is a mix. The Linux kernel and most C APIs use UTF-8 narrow strings. But Mica itself works in UTF-32 as its primary string encoding. These two worlds have to talk to each other at function boundaries, and doing it wrong means either silent data corruption or a layer of explicit conversion code in every program.

Mica’s contract system solves this at the compiler level.

When the type registry initializes, it is told the target platform’s string encoding (UTF-32 for the default Linux target). The contract resolver uses this encoding flag to select the correct external symbol for every library function that handles strings:

Library functionUTF-32 → resolves toUTF-8 → resolves to
WriteLnwprintfprintf
ReadLnwscanfscanf
Writefwprintffprintf

This selection happens inside the compiler during contract resolution. Your Mica source always says WriteLn. The compiled binary calls the right external symbol for the platform’s encoding. You never write wprintf or printf yourself.

{ This Mica code is identical for UTF-32 and UTF-8 targets.
  The compiler selects the right symbol automatically. }
program HelloEncoding;

imp
    WriteLn : std;

begin
    WriteLn("Grüße aus Mica — 日本語 — Ελληνικά");
end.

The process library follows the same pattern. ProcessArguments and ProcessArgumentsUtf8 are separate tests precisely to verify that both encoding paths work correctly for Unicode argument values. Your application code is the same in both cases; only the target encoding contract changes.

This design means that porting Mica to a UTF-8-first target — or supporting a narrow-string execution mode for embedded work — requires no changes to Mica source code. The encoding boundary is in the contracts, not in the programs.

The PascalCase API

The public Mica callable surface follows a consistent naming rule:

ConventionApplies to
lowercaseKeywords, library namespace names (std, math, process)
PascalCaseAll callable library API symbols (WriteLn, ArgCount, Sin)

This rule makes the source surface self-consistent and readable. Old lowercase spellings are not present in the shipped Mica source artifacts.

String Encoding

Mica uses UTF-32 as its default string encoding. String literals are stored as zero-terminated UTF-32 sequences in read-only data. The contract system resolves the correct encoding-specific external names transparently — wprintf for UTF-32, printf for UTF-8 — so you never see this detail in Mica source code.


◈ The Test Harness

Package: tests/ · 538 test cases · 27 Go implementation files

The test harness is one of the most impressive parts of the project. It is a first-class development tool that has driven every feature from day one. No feature lands without harness coverage.

Test Categories

CategoryCountWhat it validates
execution266Full pipeline: compile → link → run → check stdout and exit code
errors200Compiler error messages and diagnostics
il49Spectra intermediate language output
asm23Generated x86_64 assembly instructions
Total538

Manifest-Driven Testing

Every test is defined by a test.json manifest:

{
  "name": "ForIntegralLoops",
  "category": "execution",
  "sources": ["ForIntegralLoops.mica"],
  "expect": {
    "stdout": "expected/stdout.txt",
    "exit_code": 0
  }
}

Adding a test is: write the Mica source, write the expected output, create the manifest. Done.

Test Variants

A single test can run under multiple compiler configurations:

{
  "name": "CheckedArithmeticPolicyDebugRelease",
  "category": "execution",
  "compiler": { "optimize": ["debug"] },
  "expect": { "stdout": "expected/stdout.release.txt", "exit_code": 0 },
  "variants": [
    {
      "name": "checked",
      "compiler": { "optimize": ["checked"] },
      "expect": { "stdout": "expected/stdout.checked.txt", "exit_code": 1 }
    },
    {
      "name": "release",
      "compiler": { "optimize": ["release"] },
      "expect": { "stdout": "expected/stdout.release.txt", "exit_code": 0 }
    }
  ]
}

Same source, different flags, independently verified. This is how the harness proves that checked arithmetic detects overflow in checked mode while silently wrapping in debug and release modes.

CHECK Directive Matching

For IL and assembly tests where output contains variable content (register numbers, memory offsets), the harness uses CHECK directives — regex-based pattern matching:

CHECK: (?m)^\s+mov\s+dword ptr \[rbp-\d+\], 1\b
CHECK: (?m)^\s+add\s+r10, r11\b
CHECK-NOT: (?m)^\s+call\s+rt\.runtime_failure\b

CHECK: lines must match. CHECK-NOT: lines must not match. Similar to LLVM’s FileCheck, adapted for Mica’s needs.

Stress Testing

Stress tests ("tags": ["stress"]) are excluded by default, enabled with -stress:

TestWhat it tests
StressLOC1kCompile a 1,000-line generated program
StressLOC10k10,000 lines
StressLOC100k100,000 lines — takes 6.687 s
StructABI1MBStress1 MB aggregate passing through the ABI

Test Runner CLI

# All tests
go run ./tests/cmd/mica-test run

# With stress tests
go run ./tests/cmd/mica-test run -stress

# Filter by name
go run ./tests/cmd/mica-test run -filter For

# Filter by category
go run ./tests/cmd/mica-test run -category il

# Update expected output files
go run ./tests/cmd/mica-test update

# List tests without running
go run ./tests/cmd/mica-test list

# CPU profiling
go run ./tests/cmd/mica-test run -cpuprofile profile.out

What the Tests Cover

Control flow: ForIntegralLoops, ForSubrangeLoop, ForEnumLoop, ForBoundsEvaluatedOnce, ForTerminalBoundSafety, ForLeave, RepeatSimple, RepeatExecutesOnce, RepeatLeave, CaseIntegerElse, CaseSubrangeSelector, CaseEnumMatch, CaseImplicitOrdinalPromotion

Process library: ProcessArguments, ProcessArgumentsUtf8

For-loop error enforcement: ForBodyAssignmentRejected, ForBodyAddressTakeRejected, ForControlVariableParameterRejected, ForNestedReuseRejected, ForControlVariableBooleanRejected, ForControlVariableConstantRejected, ForMissingToOrDownto, ForMissingDo, and more

Case error enforcement: CaseSelectorNonOrdinal, CaseDuplicateLabel, CaseLabelMustBeConstant, CaseLabelIncompatibleWithSelector, CaseSemicolonBeforeElseCompound, CaseLabelRangeUnsupported, and more

Arithmetic and types: ArithmeticAdd/Subtract/Multiply/Divide across all 10 numeric types, Booleans, Characters, Strings, StringFormatFloat, StringFormatInt

Functions: FunctionRecursive, FunctionNested, FunctionMultipleReturnPaths, AllTypesFunction

Aggregates: StructFieldReadWrite, StructNestedField, StructPassReturnByValue, ArrayPassReturnByValueMatrix, ArrayMultidimensional, PackedRecordFieldReadWrite, PackedArrayIndexing, SetPassReturnMembership, SetConstructorRange, PascalFoundationComplexCombinations

Memory: PointerArithmetic, PointerAssignment, PointerDereferenceChain

Safety: CheckedArithmeticPolicyDebugRelease, CheckedCastFloatToIntPolicies

Multi-file: CrossCuImportFunction, CrossCuImportDataType, CrossCuImportDataTypeFingerprintMismatch

Error diagnostics: MissingEnd, DuplicateIdentifier, ParamCountMismatch, UninitializedPointer, ConstDivByZero, ConstNaN, ConstInf, InteropContractLayoutMismatch, and ~170 others


◈ Package Architecture

Here is the complete package layout with primary responsibilities:

mica-compiler/
│
├── main.go                               Entry point and CLI
│
├── scanner/        (~10 files)           Lexical analysis
│   ├── scanner_impl.go                   Core scanning logic
│   ├── identifier_impl.go                Identifier tokenization
│   ├── literal_impl.go                   String/char literals
│   ├── number_impl.go                    Numeric literals
│   └── position_impl.go                  Source position tracking
│
├── token/          (~5 files)            Token type definitions
│
├── parser/         (~32 files)           Recursive descent parser
│
├── ast/            (67 files)            Abstract syntax tree
│   ├── ast.go                            Node interface and kinds
│   ├── block_impl.go                     Block/scope structure
│   ├── declaration.go                    Declaration nodes
│   ├── expression.go                     Expression nodes
│   ├── statement.go                      Statement nodes (inc. For, Repeat, Case)
│   ├── global_registry.go                Cross-unit coordination
│   └── visitor.go                        Traversal infrastructure
│
├── typesystem/     (37 files)            Type system and ABI
│   ├── primitive.go                      Built-in primitive types
│   ├── structure.go                      Record types
│   ├── array.go                          Array types
│   ├── set.go                            Set types
│   ├── subrange.go                       Subrange types
│   ├── enum.go                           Enumeration types
│   ├── pointer.go                        Pointer types
│   └── type_registry_impl.go             ABI classification
│
├── symbols/        (~3 files)            Symbol table
│
├── analyzer/       (22 files)            Semantic analysis
│   ├── static_analysis_*_impl.go         8 validation passes
│   ├── constant_folding_impl.go          Compile-time evaluation
│   ├── lowering_impl.go                  Expression normalization
│   ├── type_coercion_impl.go             Implicit conversions
│   └── quality_gate_impl.go              Internal consistency verification
│
├── evaluation/     (20 files)            Compile-time expression evaluator
│
├── intermediate/   (~14 files)           Spectra IL
│   ├── intermediate.go                   TAC instruction definitions
│   ├── spectra_impl.go                   Human-readable text format
│   ├── symbol_table.go                   IL symbol table
│   └── activation_record_layout_impl.go  Stack frame calculation
│
├── generator/      (16 files)            AST → Spectra IL
│
├── cfg/            (~3 files)            Control flow graph
│
├── emitter/        (~19 files)           Spectra IL → x86_64 assembly
│   ├── arithmetic_impl.go                Arithmetic instruction selection
│   ├── comparison_impl.go                Comparison and branch instructions
│   ├── conversion_impl.go                Runtime type conversions
│   ├── aggregate_impl.go                 Struct and array emission
│   ├── function_call_impl.go             Call site emission and ABI marshaling
│   ├── prologue_epilogue_impl.go         Frame setup and teardown
│   └── runtime_failure_impl.go           Checked-arithmetic failure handlers
│
│   ├── x86_64/     (~13 files)           x86_64 instruction model
│   │   ├── x86_64.go                     Instructions, registers, addressing
│   │   └── debug_info_impl.go            DWARF v5 attribute generation
│   │
│   ├── elf/        (~6 files)            ELF binary encoding
│   │   ├── elf.go                        Section layout and symbol tables
│   │   └── dwarf_impl.go                 DWARF v5 section encoding
│   │
│   └── optimizer/  (~20 files)           Peephole optimizer
│       └── *_impl.go                     17 independent optimization passes
│
├── compiler/       (~11 files)           Compiler driver
│   ├── driver_impl.go                    Pipeline orchestration
│   ├── compilation_unit_impl.go          Per-file compilation
│   ├── binary_unit_impl.go               Binary generation and linking
│   └── concurrency_impl.go               Parallel phase execution
│
├── interop/        (~7 files)            C interoperability
│   ├── resolver_impl.go                  Contract symbol resolution
│   ├── default_library_contracts.json    Embedded contracts (std, process,
│   │                                     limits, cstd, math)
│   ├── library_contracts.schema.json     Contract JSON schema
│   └── standard/mica-stdlib.c            C23 runtime implementation
│
├── platform/       (~5 files)            Target platform abstraction
│   ├── platform.go                       OS, ISA, ABI definitions
│   └── abi_impl.go                       System V AMD64 and AAPCS64
│
├── errors/         (~3 files)            Diagnostic system
├── debugging/      (~2 files)            Debug information helpers
├── export/         (~2 files)            IR export (JSON/text)
├── collection/     (~4 files)            Utility data structures
│
├── tests/                                Test harness
│   ├── cmd/mica-test/                    Test runner CLI
│   ├── harness/    (35 files)            Framework: discovery, execution,
│   │                                     CHECK matching, variant dispatch
│   └── cases/
│       ├── execution/   (266 tests)
│       ├── errors/      (200 tests)
│       ├── il/          (49 tests)
│       └── asm/         (23 tests)
│
├── examples/                             Working example programs
│   ├── Playground/                       General experimentation
│   ├── Cast/                             Type casting and conversions
│   ├── ConstantFolding/                  Compile-time constant evaluation
│   ├── MathPower/                        Math library interop (pow, sin, cos)
│   ├── Nesting/                          Nested procedures and lexical scoping
│   ├── ReadLnUsage/                      Console input examples
│   ├── ShortCircuit/                     Short-circuit boolean evaluation
│   ├── UtfSources/                       UTF-32 and Unicode source examples
│   ├── Utilities/                        Multi-file library example
│   ├── MicaCallsC/                       Mica calling C libraries
│   ├── CCallsMica/                       C calling Mica functions
│   ├── MicaCallsMica/                    Multi-file Mica projects
│   └── MicaCallsLinux/                   Direct Linux syscall contracts
│
├── scripts/
│   ├── build-compiler.sh                 Build script
│   └── bin/                              Compiled binaries
│
└── .backlog/
    ├── current/                          Active backlog items
    ├── incubation/                       Research and future ideas
    └── archive/                          Completed and closed items

◈ Building the Compiler

# Build the compiler and standard library
scripts/build-compiler.sh

# Produces:
#   scripts/bin/mica          — compiler binary
#   scripts/bin/mica-stdlib.a — standard library archive

Compiling a Mica Program

scripts/bin/mica \
    --compile --link \
    --source examples/Playground/Playground.mica \
    --build build/playground \
    --optimize debug \
    --assembly intel \
    --stdlib scripts/bin/mica-stdlib.a

Running the Full Test Suite

# All tests
go run ./tests/cmd/mica-test run

# All tests including stress
go run ./tests/cmd/mica-test run -stress

# Filtered
go run ./tests/cmd/mica-test run -filter Case
go run ./tests/cmd/mica-test run -filter Repeat
go run ./tests/cmd/mica-test run -category il

◈ Roadmap

Horizon 1 — Version 4.5 (March 2026) ✓ Released

The 4.5 release theme: finish the release-facing compiler story without rewriting the compiler.

Work itemStatus
for loops (to and downto)✓ Done
repeat ... until (post-test loop)✓ Done
Minimal ordinal case statement✓ Done
Public library API reset to PascalCase✓ Done
process library for program arguments✓ Done
Driver-side concurrent assembly persist/export✓ Done
Full and stress harness pass✓ Done

4.5 closure evidence:

  1. scripts/build-compiler.sh passes
  2. go run ./tests/cmd/mica-test run passes
  3. go run ./tests/cmd/mica-test run -stress passes
  4. StressLOC100k: 6.687 s · StressLOC10k: 718 ms · StressLOC1k: 142 ms

What 4.5 does not include by design: with, heap features, extended case label lists and ranges, SSA and optimizer overhaul, variant records, conformant arrays, defer, new/dispose.

Horizon 2 — Version 4.6 (Q2–Q4 2026) — Active

Text runtime and string direction: UTF-32-first string model as the canonical Mica string type; stringpart (borrowed text ranges for tokenization/substring without allocation); stringbuffer (owned growable UTF-32 text for append and formatting); cstring (explicit narrow UTF-8 boundary for C APIs); stringpool (arena storage for temporary text).

Standard library modernization: A modern, Mica-native library surface; compiler-emitted JSON contracts for Mica libraries so published and imported surfaces use the same artifact; curated external contract files for well-known C libraries; first POSIX and Linux API access through contracts.

Language ergonomics: Variable initialization expressions (var x : int32 := 42;); checked_bounds for subranges and array indices; panic and assert; explicit heap allocation with new and dispose; deterministic cleanup with defer dispose.

Optimizer infrastructure: SSA transformation; dominators; liveness, alias, and loop analysis; register allocation; broader global optimization passes.

Extended control flow: case label lists and label ranges (deferred from 4.5); richer for forms.

Horizon 3 — 2027+ (AI Track)

This is the strategic direction that keeps Mica distinctive:

  • Concurrency: Structured lexical workers built around nested procedures; coroutines and generators; async I/O after concurrency semantics are stable; atomics and a documented memory model
  • Mathematical types: Fixed-shape vectors and matrices as built-in types; compile-time shape and dimension checking; mixed-precision numeric types; compiler-native automatic differentiation
  • Hardware: CPU SIMD lowering for vector/matrix operations; GPU offload and accelerator lowering after CPU vector semantics are solid
  • Platform: Linux AArch64 as the next architecture target (AAPCS64 already in the platform layer); macOS deferred (requires separate Mach-O toolchain)
  • Ecosystem: AI library contracts for BLAS, cuBLAS, oneDNN; interactive REPL for numerical workflows

◈ For Students, Professors, and Enthusiasts

If you are interested in compiler construction, Mica is an unusually clear learning resource:

  • Complete pipeline — scanner through ELF binary in one repository, one implementation language, zero external dependencies
  • Real target — produces actual x86_64 ELF binaries, not bytecode for a VM
  • Real debug information — DWARF v5, usable with GDB right now
  • Clean, readable Go — the implementation language is approachable and consistently structured
  • Rich test suite — tests that serve as both specification and proof
  • Inspectable IL — Spectra is a human-readable intermediate language you can study and export for any program you write

The package structure maps cleanly to a compiler textbook:

PackageTextbook chapter
scanner/Lexical analysis
parser/Syntax analysis
ast/Intermediate representations
typesystem/Type systems
analyzer/Semantic analysis
intermediate/ + generator/Code generation
emitter/Target code generation
emitter/optimizer/Optimization
emitter/elf/Object file formats

The difference between Mica and a textbook toy: Mica compiles real programs, produces real ELF binaries, implements the full System V AMD64 ABI, generates DWARF v5 debug information, and has many tests that prove it all works. The problems textbooks wave away — nested function activation records, aggregate passing by value across function boundaries, ABI-correct structure layout, packed type semantics — are all solved here and all tested.


◈ Project History

DateMilestone
December 2023First commit
2024Scanner, parser, AST, type system, initial code generation
2025Semantic analysis hardening, full aggregate types (packed records/arrays, sets, enums, subranges), interop contracts, DWARF v5
January 2026Version 4.0.0
March 2026Version 4.5.0 — first public release, many tests, for/repeat/case, process library, PascalCase API

2.5 years of consistent development, one language, one compiler, one goal: a clean, teachable, systems-capable compiler that tells the truth.


◈ License

The Mica compiler source code is available under the MCL-1.0 (Mica Compiler Non-Commercial License):

  • Free for personal learning, private projects, academic research, and teaching
  • Commercial use requires a separate written license agreement
  • Contact: info@mica-dev.com

Mica is being built in the open because compilers deserve to be understood — not just used. If you have read this far, you are the kind of person this project was built for.