Reinforcement Learning (RL)
This wiki entry defines a term used across VSAVM and explains why it matters in the architecture.
The diagram has a transparent background and highlights the operational meaning of the term inside VSAVM.
Related wiki pages: VM, event stream, VSA, bounded closure, consistency contract.
Definition
Reinforcement learning learns preferences over actions using feedback signals such as rewards and penalties.
Role in VSAVM
VSAVM uses RL as shaping when multiple plausible candidates exist. The goal is to select interpretations and response modes that remain stable under bounded closure, not to optimize token-by-token behavior.
Mechanics and implications
The action space is coarse: choose a schema, choose a macro program, choose a response mode. Closure-derived contradictions provide negative signals that discourage unstable choices. RL complements, but does not replace, explicit closure gating.
Further reading
RL is a broad area. VSAVM’s practical use is closer to bandit-like shaping than to full on-policy token-level control.
References
Reinforcement learning (Wikipedia) Sutton & Barto (book) Multi-armed bandit (Wikipedia)