BSP Optimization Journey
Last Updated: 2026-01-16
Overview
This page documents the optimization experiments conducted on BSP, tracking what worked, what didn't, and the measurable impact of each change.
Initial State (Baseline)
BPC: 4.93
vs Gzip: -104%
Fixed universe (100k), no compression machine
Current State (2026-01-16)
BPC: 2.20
vs Gzip: +8.6%
Adaptive universe + compression machine + vocab fix
Optimization Timeline
DS-020: Adaptive Universe SUCCESS
Date: 2026-01-16 (early)
Problem: Fixed universe size of 100,000 meant every surprise bit cost log₂(100k) ≈ 16.6 bits, even when only 1,000 tokens were seen.
Solution: Dynamic universe sizing based on observed vocabulary: effectiveUniverse = max(1000, vocabSize × 2)
Results:
- 1k lines: 16.6 → 11.1 bits/surprise (33% reduction)
- 5k lines: 16.6 → 13.1 bits/surprise (21% reduction)
- BPC improved but still had scaling issues
Spec: DS-020
DS-021: Compression Machine SUCCESS
Date: 2026-01-16 (mid)
Problem: Group-based compression alone couldn't exploit structural patterns like repetition and copying.
Solution: Procedural encoding layer with operators:
- COPY: LZ77-style back-references (cost: ~14 bits vs 66 bits for 6 tokens)
- REPEAT: Run-length encoding for patterns
- TEMPLATE: Structure ready, learning pending
- LITERAL: Fallback direct encoding
Results:
- Program win rate: 85% on 5k lines
- COPY operations: 3,637 uses, 26 bits savings each
- Combined BPC: 2.98 → 2.79 (6.3% improvement)
- But still had vocab explosion issue...
Spec: DS-021
Vocabulary Decoupling Fix BREAKTHROUGH
Date: 2026-01-16 (late)
Problem: N-gram tokenization (1-3 grams) caused vocabulary explosion:
- Input: "the cat sat" → 6 tokens: [the, cat, sat, the_cat, cat_sat, the_cat_sat]
- 5k lines → 4,483 n-gram tokens vs ~1,200 actual words
- CompressionMachine used full n-gram vocab for cost calculation
- Penalty: ~2.2 bits/token overpayment
Solution: Decoupled vocabularies:
BSPEngine.vocabTracker: All tokens (n-grams) for universe sizingCompressionMachine.wordVocab: Only unigrams for cost calculation- Updated all operators to use
effectiveVocabconsistently
Results:
Before Fix
1k: 2.27 BPC
5k: 2.79 BPC
Degraded with more data ❌
After Fix
1k: 2.04 BPC
5k: 2.20 BPC
Scales correctly ✅
- 1k lines: 10.1% improvement (2.27 → 2.04)
- 5k lines: 21.1% improvement (2.79 → 2.20)
- Program win rate: 48% → 85% as data scales
- Both configurations now beat Gzip (2.41 BPC)
Files Modified:
CompressionMachine.mjs: _tryRepeatEncoding, _tryTemplateEncoding, _matchTemplate
Performance Metrics
Compression Quality (5k lines training)
| Metric | Value | Notes |
|---|---|---|
| BPC (Combined) | 2.20 | Beats Gzip (2.41) by 8.6% |
| BPC (Groups Only) | 2.98 | 26% worse than combined |
| Shannon Entropy | 4.38 | Character-level baseline |
| Program Win Rate | 85.0% | Machine dominates |
| COPY Operations | 3,637 | 26 bits savings/op |
System Performance
| Metric | Value | Status |
|---|---|---|
| Throughput | 338 lines/sec | ⚠️ Needs suffix array optimization |
| Groups Learned | 1,144 | ✅ Linear scaling (~0.23/line) |
| Vocab Size | 4,483 (n-grams) | ✅ For semantic grouping |
| Word Vocab | ~1,200 (unigrams) | ✅ For compression costs |
BLiMP Grammatical Competence
| Task | Accuracy |
|---|---|
| Anaphor Agreement | 46.0% |
| Determiner-Noun 1 | 17.2% |
| Determiner-Noun 2 | 41.3% |
| Average | 34.8% |
Key Insights
What Works
- LZ77-style COPY: Dominates with 85% win rate on narrative text
- Adaptive sizing: MDL principle reduces cost by 21-33%
- Vocabulary decoupling: Critical for correct cost calculation
- Hybrid architecture: Groups for semantics, programs for structure
What Doesn't Work Yet
- REPEAT operations: Only 1 use in 5k lines (needs fuzzy matching)
- Template learning: Structure ready but not active (needs implementation)
- Group-only compression: 26% worse than combined (high surprise count)
Scaling Behavior
- BPC increases slightly (2.04 → 2.20) but stays below Gzip
- Program win rate increases dramatically (48% → 85%)
- More training data → more context → better COPY matches
- Throughput decreases due to O(N×M) COPY search
Future Optimizations
Priority 1: Template Learning TESTED - FAILED
Date Tested: 2026-01-16
Expected Impact: 2.20 → ~1.80 BPC (18% improvement)
Actual Results:
- Templates learned: 15
- Templates used: 0 ❌
- BPC: 2.21 (+0.01, worse)
- Throughput: 221 l/s (-35%) ❌
Why It Failed:
- Train/test mismatch - templates don't generalize
- Too specific - exact phrase matching
- Pure overhead - 28,905 failed match attempts
Decision: Feature disabled. See EXPERIMENT_TEMPLATE_LEARNING.md
Lesson: COPY works because it matches recent context (same document). Templates tried to match training data (different documents).
Suffix Array for COPY TESTED - PARTIAL
Date Tested: 2026-01-16
Expected Impact: 305 → 500+ lines/sec
Actual Results:
- Quick test (1k): 458 l/s (+50%) ✅
- Full test (5k): 232 l/s (-24%) ❌
- BPC: No change
Why It Failed: Small context (256 tokens), frequent rebuilds, O(N log N) build cost > O(N×M) search savings
Decision: Disabled by default. See EXPERIMENT_SUFFIX_ARRAY.md
Lesson: Simple O(N) can beat complex O(log N) for small N
Priority 1 (NEW): Rolling Hash Map PLANNED
Expected Impact: 301 → 400+ lines/sec (33% throughput)
Why this approach: O(1) lookup with no build cost overhead
Solution:
- Hash first 3 tokens at each context position
- Lookup candidates in O(1) average case
- Extend matches linearly
Complexity: Build O(N), Query O(1), Memory O(N)
Priority 3: Frequency-Weighted Coding PLANNED
Expected Impact: 2.20 → ~1.90 BPC (14% improvement)
Concept: High-frequency words should cost less than rare words
Current: All words cost log₂(vocab) ≈ 11 bits
Proposed: Huffman-style encoding
- Top 100 words (50% of text): ~7 bits
- Common words (30% of text): ~10 bits
- Rare words (20% of text): ~14 bits
- Average: ~9 bits (18% reduction)
Theoretical Limits
Shannon Entropy: 4.38 BPC (character-level)
Current: 2.20 BPC (50% of entropy)
Estimated Ceiling: ~1.50 BPC with all optimizations
- Template learning: -0.30 BPC → 1.90
- Frequency coding: -0.20 BPC → 1.70
- Better grouping: -0.20 BPC → 1.50
Resources
- DS-020: Adaptive Universe Spec
- DS-021: Compression Machine Spec
- Optimization Plan (Markdown)
- Optimization Session Summary (Markdown)
- Compression Insights (Markdown)
- Session Summary: Vocab Fix
Quick Commands
# Quick benchmark (1k lines, ~17s)
node evals/runLM_Comp.mjs --quick --retrain
# Full benchmark (5k lines, ~1.3min)
node evals/runLM_Comp.mjs --retrain
# View latest results
cat evals/lm_comparative/results/latest.json | jq