The VM core and retrieval interaction

This page is a theory note. It expands the topic in short chapters and defines terminology without duplicating the formal specification documents.

The diagram has a transparent background and is intended to be read together with the caption and the sections below.

Related wiki pages: VM, event stream, VSA, bounded closure, consistency contract, macro-program.

Related specs: DS001, DS002.

Overview

The VM is the system's semantic core. It stores facts, rules, contexts, and traces and executes programs to construct state. Retrieval exists to reduce search cost by proposing candidates, but it does not decide what is true. The VM presents a conversational interface that mirrors large language models while operating on fundamentally different principles: explicit programs within a virtual machine rather than latent numerical states distributed across parameters.

VM state and memory model

The virtual machine maintains its complete operational state through four interconnected memory structures:

Canonical fact store: Primary repository for all knowledge, using normalized representation with unique canonical identifiers. Facts are organized with typed slots (subject, predicate, object, temporal qualifiers, certainty levels, source attribution) and support multiple access patterns through primary, secondary, temporal, and source indices.
Rule and macro-program memory: Contains executable knowledge in compiled form. Rules follow conditional structure with premises, conclusions, and constraints. Macro-programs represent consolidated reasoning patterns that accept parameters and maintain local state.
Binding environment: Manages temporary variables and intermediate results during execution via a stack-based system with lexical scoping and type checking.
Execution log: Complete trace of all operations for debugging, explanation, auditing, and rollback capabilities.

Instruction set architecture

A small, typed instruction set reduces absurd combinations and branching blow-ups. The key instruction categories are:

Term construction: MAKE_TERM, CANONICALIZE, BIND_SLOTS for creating symbolic structures with strict type discipline.
Fact manipulation: ASSERT (with consistency checks), DENY (context-sensitive negation), QUERY (pattern-based search with variable bindings).
Logical reasoning: MATCH (unification), APPLY_RULE, CLOSURE (bounded transitive closure).
Control flow: BRANCH (deterministic or exploratory), CALL (sync/async invocation), RETURN.
Context management: PUSH_CONTEXT, POP_CONTEXT, MERGE_CONTEXT, ISOLATE_CONTEXT for maintaining multiple consistent theories.

Execution modes

The VM operates in two primary modes:

Interpretation mode: Handles conversion of external input into internal symbolic representations. Coordinates event stream parsing, term construction, entity resolution, relationship extraction, and assertion processing.
Reasoning mode: Performs logical inference and consistency checking through cycles of rule application and conflict detection. Uses both syntactic matching and VSA semantic similarity for rule selection.

State transitions are deterministic, enabling both forward execution and backward analysis for explanation generation.

Macro-instruction system

The macro-instruction system bridges primitive symbolic operations and complex reasoning patterns:

Consolidation: Recurring instruction patterns are identified through execution log analysis, evaluated for frequency, compression benefit, and generalizability.
Parameter abstraction: Essential structure is preserved while incidental details are parameterized for reuse across contexts.
Execution optimization: Precompiled forms with dead code elimination, common subexpression elimination, and instruction reordering.
Schema-to-program compilation: Linguistic patterns map to macro-instructions, enabling immediate invocation when similar patterns are encountered.

How retrieval interacts

VSA provides similarity-driven shortlists of schemas and macro programs. These shortlists are inputs to search and compilation, not outputs of truth. Every retrieved candidate must be validated by execution and closure to preserve the correctness contract under noise and paraphrase variation. Hypervectors enable rapid approximate nearest neighbor search; the VM then validates through precise symbolic matching.

Unified event stream representation

All input modalities converge into a canonical event stream with three essential components per event:

Type identifier: text_token, visual_element, temporal_marker, structural_separator, etc.
Discrete payload: Standardized data format for VM processing.
Structural context path: Hierarchical positioning (document, chapter, section, paragraph, sentence, span).

Reversibility is fundamental: every event must be expandable back to its constituents for source tracing and verification.

Engineering benefits

The explicit VM core makes it possible to unit test rules, regression test closure behavior, and audit decisions. Retrieval can be swapped or improved without changing semantics, because semantics are enforced by the VM and contract rather than by similarity ranking. Memory management uses reference counting, garbage collection, and caching with locality optimization for related facts and rules.

vm-core diagram — A compact VM core remains the authority; retrieval accelerates candidate selection without changing semantics. The four memory structures maintain complete operational state.

References

Symbolic execution (Wikipedia) Vector symbolic architecture (Wikipedia) Execution trace (Wikipedia) Virtual machine (Wikipedia)