VSAVM

Macro-unit (macro token)

This wiki entry defines a term used across VSAVM and explains why it matters in the architecture.

The diagram has a transparent background and highlights the operational meaning of the term inside VSAVM.

Related wiki pages: VM, event stream, VSA, bounded closure, consistency contract.

Definition

A macro-unit is a reversible sequence of tokens (in the current training harness: bytes) that is promoted because it improves compression under MDL and is useful for continuation prediction (DS011).

Role in VSAVM

Macro-units provide a “larger than token” unit for the DS011 outer loop:

Mechanics and implications

Reversibility is mandatory. If expansion is ambiguous, scoring becomes inconsistent and the system cannot maintain traceability. VSAVM treats deterministic expansion as a hard constraint.

Macro-units are not the same thing as structural separators:

Implementation notes (current code)

The concrete macro-unit model is implemented in src/training/outer-loop/macro-unit-model.mjs. It supports streaming training (trainStream), bounded n-gram orders, pruning, and a trie for fast encoding/decoding.

In eval_tinyLLM, trained macro-unit models are cached under eval_tinyLLM/cache/models/vsavm/<datasetId>/<modelId>/ so multiple dataset sizes and multiple model variants can coexist.

Further reading

Macro-units relate to tokenization and compression. VSAVM’s emphasis is on reversibility and auditability, and on keeping scope boundaries structural (not domain-labelled).

macro-token diagram
Macro-units compress recurring patterns while preserving deterministic expansion for evaluation and continuation.

References

Tokenization (Wikipedia) Data compression (Wikipedia) Minimum description length (Wikipedia)