AGISystem2 generates deterministic hypervectors from atom names. This enables reproducibility and distributed computation. Most strategies use a hash/PRNG pipeline; the lossless EXACT strategy is different (session-local appearance-index dictionary) and is called out explicitly below.
Privacy-Preserving Implications: Deterministic generation is foundational for
privacy-preserving HDC, federated learning, and partial homomorphic computation. The secret lies not in the algorithm, but in the
seed.
1. The Generation Pipeline
Most HDC strategies (Dense-Binary, SPHDC, Metric-Affine, EMA) follow the same core pipeline:
1
Name Scoping: Combine theory ID with atom name for namespace isolation
3
PRNG: Initialize xorshift128+ with seed
4
Vector Fill: Generate vector content using PRNG (strategy-specific)
Pipeline:
name → scope(theoryId, name) → DJB2(scoped) → PRNG(seed) → Vector
EXACT differs: the lossless EXACT strategy does not use hashing/PRNG for atom IDs. It assigns atoms an appearance index inside a Session and encodes them as one-hot bits (bitset-polynomial representation). Determinism depends on consistent load order within a session (see
EXACT and
DS25).
EXACT pipeline (conceptual):
name → sessionDictionary.getOrCreate(name) → appearanceIndex → one-hot bit → vector terms
2. Hash Function: DJB2
The DJB2 hash converts any string into a 32-bit unsigned integer:
function djb2(str) {
let hash = 5381;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) + hash) + str.charCodeAt(i);
hash = hash >>> 0;
}
return hash;
}
Properties:
- Deterministic: Same input → same output, always
- Fast: O(n) where n is string length
- Good distribution: Different names produce well-distributed seeds
- Not cryptographic: Fast but not secure against adversarial inputs
3. PRNG: xorshift128+
The seeded PRNG ensures identical random sequences from identical seeds:
class PRNG {
constructor(seed) {
this.s0 = BigInt(seed) | 1n;
this.s1 = BigInt(seed * 0x6C078965) | 1n;
}
random() {
let s1 = this.s0;
const s0 = this.s1;
this.s0 = s0;
s1 ^= s1 << 23n;
s1 ^= s1 >> 17n;
s1 ^= s0;
s1 ^= s0 >> 26n;
this.s1 = s1;
return Number(BigInt.asUintN(32, s0 + s1)) / 0xFFFFFFFF;
}
}
Properties:
- Period: 2128 - 1 (astronomically long)
- Quality: Passes BigCrush statistical tests
- Speed: Very fast (simple bitwise operations)
- Deterministic: Same seed → same sequence, guaranteed
4. Strategy-Specific Vector Fill
4.1 Dense-Binary: ASCII Stamping
Dense-Binary creates vectors using a recognizable ASCII pattern combined with PRNG variation:
function createFromName(name, geometry, theoryId = 'default') {
const scopedName = theoryId + ':' + name;
const seed = djb2(scopedName);
const prng = new PRNG(seed);
const ascii = name.split('').map(c => c.charCodeAt(0) & 0xFF);
const baseStamp = packAsciiToWords(ascii);
for (chunk of vector) {
chunk = baseStamp XOR prng.nextWords(8);
}
}
Key characteristics:
- Theory-scoped: "default:Dog" ≠ "physics:Dog"
- Debuggable: ASCII pattern survives in the vector
- High-dimensional: Default 32,768 bits (4 KB)
4.2 Sparse Polynomial (SPHDC): Random Exponents
SPHDC generates k random 64-bit integers as exponents:
function createFromName(name, geometry = 4) {
const seed = djb2(name);
const prng = new PRNG(seed);
const exponents = new Set();
while (exponents.size < geometry) {
const high = prng.randomUint32();
const low = prng.randomUint32();
exponents.add((BigInt(high) << 32n) | BigInt(low));
}
return new SPVector(exponents, geometry);
}
Key characteristics:
- Compact: Only k × 8 bytes (default 32 bytes for k=4)
- Uniform: Pure random generation, no ASCII pattern
- Fast: Only 2k random numbers needed
Implementation Note: The current SPHDC implementation does not use theory scoping (theoryId parameter is ignored). This means the same atom name produces identical vectors across all theories. This is a known limitation for namespace isolation in SPHDC.
4.3 Metric-Affine / EMA: Byte Channels
Metric-Affine generates a deterministic byte sequence from the (scoped) name. In AGISystem2 terms, geometry for these strategies is simply D = the number of byte channels (vector length in bytes). EMA shares the same atom initialization logic and the same geometry knob, but improves large KB superpositions via chunked bundling (bounded depth), not by auto-growing D during a session.
function createFromName(name, bytes, theoryId = 'default') {
const scoped = theoryId + ':' + name;
const seed = djb2(scoped);
const prng = new PRNG(seed);
const buf = new Uint8Array(bytes);
for (let i = 0; i < bytes; i++) {
buf[i] = prng.randomByte();
}
return buf;
}
4.4 EXACT: Appearance-Index Dictionary (Lossless)
EXACT assigns each newly seen atom an appearance index inside the current Session and encodes that index as a one-hot bit. Composite vectors become sparse polynomials over bitset mono-terms, and UNBIND is a quotient-like operation (unbind differs from bind).
EXACT determinism:
same load order + same DSL + same Session boundary → same appearance indices → same vectors
5. Comparison: Initialization Policies
| Strategy |
Atom ID policy |
Theory scoping |
Size model |
Notes |
| Dense-Binary |
Hash + PRNG (ASCII stamp + variation) |
Yes (theoryId:name) |
Fixed bits |
Good baseline; XOR binding with cancellation |
| SPHDC |
Hash + PRNG (k random exponents) |
Currently no (name only) |
Fixed k |
Compact; set similarity (Jaccard); statistical reversibility |
| Metric-Affine |
Hash + PRNG (byte channels) |
Yes (theoryId:name) |
Fixed bytes |
XOR on bytes (cancellable binding) + continuous bundling |
| EMA |
Hash + PRNG (byte channels) |
Yes (theoryId:name) |
Elastic bytes |
Chunked bundling to stabilize superposition at scale |
| EXACT |
Session-local appearance index dictionary |
By Session load order |
Elastic bits |
Lossless IDs; UNBIND is quotient-like; decoding is witness/residual-driven |
6. Privacy-Preserving Applications
Deterministic vector generation is foundational for privacy-preserving HDC:
6.1 Secret Seed Architecture
Key Insight: If the seed derivation includes a secret master key, the atom vectors become secret keys. Without knowing the master key, an adversary cannot generate or decode vectors.
Secret Seed Generation:
scopedName = masterSecret + ":" + theoryId + ":" + name
seed = DJB2(scopedName)
vector = PRNG(seed) → [...bits or exponents...]
6.2 Federated Learning
Multiple parties can share a master seed and contribute knowledge without revealing it:
- Setup: All parties agree on master seed S
- Encoding: Each party encodes local facts using deterministic vectors
- Aggregation: Coordinator bundles all KB vectors
- Query: Any party can query using shared encoding
The coordinator sees only bundled vectors, not individual facts.
6.3 Partial Homomorphic Properties
HDC operations exhibit homomorphic-like behavior:
- Bundling is additive: bundle(E(A), E(B)) = E(A ∪ B)
- Binding is multiplicative: bind(E(A), E(B)) = E(A BIND B)
- Unbinding recovers: unbind(bind(E(A), E(B)), E(B)) ≈ E(A)
This enables computation on encoded data without decoding. Note that EXACT has different privacy properties because it keeps a session-local dictionary for atom IDs. See Privacy-Preserving HDC for detailed analysis of security properties and limitations.
7. Core Theory Atoms
When AGISystem2 loads Kernel theory packs (from config/Packs/Kernel/*.sys2), atoms like isA, __TransitiveRelation, etc. get vectors through the strategy’s initialization policy. In hash/PRNG strategies this is createFromName; in EXACT it is appearance-index allocation.
Example: Loading "isA"
1. Parser encounters: @isA:isA __TransitiveRelation
2. Executor resolves "isA" → vocabulary.getOrCreate("isA")
3. First time: createFromName("isA", 32768, "default")
4. Vector cached in vocabulary.atoms["isA"]
5. Same vector used for all subsequent "isA" references
This ensures:
- Consistency: Same atom → same vector throughout session
- Reproducibility: Same session config → same vectors
- Efficiency: Vectors computed once, cached for reuse
8. Security Considerations
8.1 What DJB2 Provides
- Good distribution (collision resistance for typical names)
- Fast computation
- Deterministic mapping
8.2 What DJB2 Does NOT Provide
- Cryptographic security (not preimage-resistant)
- Protection against dictionary attacks
- Forward secrecy
8.3 For Stronger Security
For applications requiring cryptographic guarantees:
- Replace DJB2 with SHA-256 or BLAKE3
- Use cryptographic PRNG (e.g., ChaCha20)
- Implement proper key derivation (HKDF)
See Privacy-Preserving HDC: Threat Model for detailed security analysis.
Related Pages