Memory & GC

Achronyme manages heap memory through typed arenas with a bitmap mark-and-sweep garbage collector. All heap objects are allocated in type-specific pools and reclaimed when no longer reachable.

Typed Arenas

The Heap contains 10 typed arenas, one per heap object type:

Arena	Type	Allocation Size
`strings`	`String`	`capacity()` bytes
`lists`	`Vec<Value>`	`capacity() × 8` bytes
`maps`	`HashMap<String, Value>`	`capacity × (string + value + 8)` bytes
`functions`	`Function`	`chunk.len() × 4 + constants.len() × 8` bytes
`closures`	`Closure`	function handle + upvalue handles
`upvalues`	`Upvalue`	location + linked list pointer
`iterators`	`IteratorObj`	source value + state
`fields`	`FieldElement`	32 bytes (BN254 Montgomery form)
`proofs`	`ProofObject`	3 JSON strings (proof + public + vkey)
`bigints`	`BigInt`	32–64 bytes

Arena Implementation

Each Arena<T> uses a slot-based allocator with free list recycling and bitmap mark bits:

Arena<T> {
    data: Vec<T>,           // grows monotonically
    free_indices: Vec<u32>, // reusable slot stack (LIFO)
    free_set: HashSet<u32>, // O(1) membership check
    mark_bits: Vec<u64>,    // 1 bit per slot, packed into 64-bit words
}

Allocation (alloc): reuses a freed slot if available, otherwise appends to data.

Deallocation (mark_free): adds the slot index to the free list. The slot is zeroed to release memory (e.g., String::new(), Vec::new()).

Invariant: free_indices and free_set must always stay in sync.

Bitmap Mark Bits

Each arena maintains its own Vec<u64> bitmap for GC marking. Each u64 word covers 64 slots:

set_mark(idx) — Sets bit idx % 64 in word idx / 64. Returns true if the object was previously unmarked.
is_marked(idx) — Checks whether the bit is set. Returns false for out-of-range indices.
clear_marks() — Resets all bits to zero in O(N/64) time via memset, preserving capacity for reuse.

This replaces the previous HashSet<u32> approach, reducing memory from ~56 bytes per marked object to 1 bit per slot, with cache-friendly sequential access during sweep.

Garbage Collection

The GC is a stop-the-world mark-and-sweep collector with bitmap marking.

Trigger

The heap tracks live allocation size in bytes_allocated. When it exceeds next_gc_threshold (starting at 1 MB), the GC is triggered:

if heap.request_gc || stress_mode {
    collect_garbage();
}

This check runs at the start of every interpreter cycle (before executing the next instruction). GC is not triggered while the GC lock is held (see GC Lock below).

Mark Phase

Breadth-first traversal from roots using a worklist:

Collect roots:
- Active stack values (per-frame, bounded by max_slots)
- All global variables
- All call frame closures
- All function prototypes (stay alive for closure creation)
Traverse children (using bitmap set_mark to track visited objects):
- List: all elements are added to worklist
- Function: all constants are added to worklist
- Closure: marks function handle + all upvalues; closed upvalues add their captured value to worklist
- Map: all values are added (keys are Rust-owned, freed when HashMap drops)
- Iterator: source value is added
- Field, Proof, BigInt: leaf types, no children
Root open upvalues: After the main trace, all open upvalues are explicitly marked via the linked list traversal. This prevents premature collection of stack variables captured by closures.

If set_mark returns false (already marked), the object is skipped, preventing infinite loops on cycles.

Sweep Phase

Linear scan over each arena:

For each slot in data:
- If not marked and not already free: zero the slot and add to free list
- If marked: leave in place
Call clear_marks() on each arena’s bitmap
Recount bytes_allocated from scratch (eliminates tracking drift from in-place mutations like Vec::push)

Threshold Adjustment

After sweeping, the threshold is recalculated with hysteresis:

grow = bytes_allocated × 2
hysteresis = previous_threshold × 1.5
next_gc_threshold = max(grow, hysteresis, 1 MB)

This prevents GC thrashing when the heap size hovers near the old threshold.

GC Lock

Native functions that perform multiple heap allocations can be interrupted by a GC cycle between allocations. If intermediate values aren’t yet rooted (not on the stack or in a global), the GC could free them prematurely.

The GC lock prevents this:

heap.lock_gc()    // increment lock depth
// ... multiple allocations are safe here ...
heap.unlock_gc()  // decrement lock depth; re-checks threshold at depth 0

The lock is reentrant — it uses a depth counter (gc_lock_depth), so nested lock/unlock pairs work correctly. Only the outermost unlock_gc() re-checks the GC threshold, allowing deferred collection.

A debug_assert in collect_garbage() ensures the GC is never triggered while the lock is held.

Protected Paths

The GC lock is currently used in:

Stdlib natives that allocate multiple objects (e.g., split, keys, chars)
Closure opcode — protects upvalue capture loop (multiple alloc_upvalue calls)
GetIter opcode — protects map iteration (key string allocations)

Heap Limit

A configurable hard cap prevents runaway programs from consuming all host memory:

ach run program.ach --max-heap 256M

When bytes_allocated exceeds max_heap_bytes, the VM raises a HeapLimitExceeded runtime error with a clear message instead of an OS-level OOM kill.

Accepted formats: "256M", "1G", "512K", or raw byte count. Default: unlimited.

GC Statistics

CLI Flag

Run with --gc-stats to print GC metrics to stderr after execution:

ach run program.ach --gc-stats

Output:

-- GC Stats --
  Collections:    12
  Freed (total):  48320 bytes
  Peak heap:      131072 bytes
  GC time:        0.847 ms
  Heap now:       24576 bytes

Native Function

The gc_stats() native function returns a map with live metrics, accessible from within a program:

let stats = gc_stats()
print(stats["collections"])    // number of GC cycles so far
print(stats["bytes_freed"])    // cumulative bytes reclaimed
print(stats["peak_bytes"])     // maximum heap size reached
print(stats["gc_time_ms"])     // cumulative milliseconds spent in GC
print(stats["bytes_allocated"]) // current live heap size

Stress GC Mode

Run with --stress-gc to force garbage collection on every interpreter cycle:

ach run program.ach --stress-gc

This triggers collect_garbage() before every instruction, exposing:

Dangling pointers: objects freed while still referenced
Rooting bugs: roots not properly tracked (e.g., open upvalues)
Timing-dependent bugs: issues that only appear under specific GC timing

Stress mode is used extensively in the test suite (vm.stress_mode = true) to verify GC correctness.

Key Heap Objects

Closure

Closure {
    function: u32,       // handle to Function
    upvalues: Vec<u32>,  // handles to Upvalue arena
}

Upvalue

Upvalue {
    location: Open(stack_index) | Closed(Value),
    next_open: Option<u32>,  // linked list for open upvalues
}

Open upvalues point to a stack slot — the variable is still in scope. When the enclosing function returns, open upvalues are closed: the value is copied from the stack into the upvalue (Closed(value)), and the upvalue is removed from the open list.

ProofObject

ProofObject {
    proof_json: String,   // Groth16 proof data
    public_json: String,  // public inputs
    vkey_json: String,     // verification key
}

Created by prove {} blocks. Accessible via proof_json(), proof_public(), proof_vkey().

Source Files

Component	File
Arena & bitmap	`memory/src/arena.rs`
Heap, trace & sweep	`memory/src/heap.rs`
Value encoding	`memory/src/value.rs`
FieldElement	`memory/src/field.rs`
GC trait & mark_roots	`vm/src/machine/gc.rs`
VM (GC trigger)	`vm/src/machine/vm.rs`
GC stats native	`vm/src/stdlib/core.rs`