Multimodal

This wiki entry defines a term used across VSAVM and explains why it matters in the architecture.

The diagram has a transparent background and highlights the operational meaning of the term inside VSAVM.

Related wiki pages: VM, event stream, VSA, bounded closure, consistency contract.

Definition

Multimodal processing integrates multiple input or output modalities such as text, audio, and images.

Role in VSAVM

VSAVM is multimodal by representation: all modalities become event streams. This allows one VM and one correctness contract to operate uniformly across modalities.

Mechanics and implications

Audio becomes transcript events with timing; images and video become symbolic descriptors or discrete tokens. Structural separators define scope even in temporal streams. The VM remains modality-agnostic because it consumes discrete events and canonical facts.

References

Multimodal learning (Wikipedia) Event stream processing (Wikipedia) Computer vision (Wikipedia)

Multimodal

Definition

Role in VSAVM

Mechanics and implications

Further reading

References