Core Question: How can you represent millions of distinct concepts without a central registry or coordination? The answer lies in the remarkable properties of high-dimensional vector spaces.
The Problem: Concept Collision
In traditional systems, representing concepts requires either:
Explicit IDs: A database assigning unique numbers (requires coordination)
The Challenge: How do you create a new concept "quantum-entangled-cat-paradox" without checking if it already exists somewhere?
The Solution: Random Vectors in High Dimensions
The Mathematics
Expected Dot Product
If v₁, v₂ are random vectors in ℝd with components from N(0, 1/d), then:
E[⟨v₁, v₂⟩] = 0Var[⟨v₁, v₂⟩] = 1/d
What This Means
0.044
Standard deviation of dot product (d = 512)
<0.003%
Probability of correlation > 0.2
~90°
Expected angle between random vectors
Key Insight: In 512 dimensions, two randomly generated vectors are almost certainly orthogonal. You can create millions of concepts by just generating random vectors—no coordination needed, no collisions possible.
Concentration of Measure
Another remarkable property: in high dimensions, almost all the "volume" of a hypersphere is concentrated in a thin shell near the surface.
Practical Implications for Spock
1. Concept Generation
// Generate a new concept - guaranteed unique
const catConcept = vectorSpace.createRandomVector(512);
const dogConcept = vectorSpace.createRandomVector(512);
// They're almost certainly orthogonal
const similarity = vectorSpace.cosineSimilarity(catConcept, dogConcept);
// → approximately 0 (±0.044)
2. Robustness to Noise
Information is distributed across all 512 dimensions. Noise on a few dimensions doesn't destroy the signal:
// Original vector
const original = [0.1, -0.3, 0.2, 0.4, ...]; // 512 components// Add 5% noise to each component
const noisy = original.map(x => x + (Math.random() - 0.5) * 0.1);
// Still highly similar!
cosineSimilarity(original, noisy); // → ~0.99
3. Gaussian vs Bipolar Generation
Method
Distribution
Use Case
Gaussian
N(0, 1/√d) per component
Dense representations, smooth similarity
Bipolar
+1 or -1 per component
Classic VSA, efficient hardware
Spock uses Gaussian by default but supports both via configuration:
// In config
{ vectorGeneration: 'gaussian' } // or 'bipolar'
Why 512 Dimensions?
Enough capacity: Can store ~log₂(512) ≈ 9 superimposed items reliably