Goal: show why multi-column consensus is not just “more of the same.” We ingest a clean stream, then we query with noisy windows. Each column sees a different noisy version of the same window. Consensus should recover the correct location more reliably than any single view or a naive list scan.
Plain-language version: many small witnesses see the story with different errors. Voting should beat a single witness and should beat a naive scan that only accepts exact matches.
Noise is the fraction of tokens in the query window that are randomly replaced. Higher noise simulates sloppy input or uncertain observations.
Baselines: “Single Column” means using just column 0. “Best Column” is the best-performing column without voting (an optimistic upper bound). “Naive List” linearly scans the full token list and only accepts an exact window match.
“Proxy” metrics are rough lower-bound estimates, not exact measurements. They are useful for comparing scaling trends (linear scans vs indexed search) without claiming precise runtime or memory.
| Metric | Value | Target | Status |
|---|---|---|---|
| Consensus Acc | 0.992 | > 0.90 | Pass |
| Single Column Acc | 0.769 | — | Baseline |
| Best Column Acc | 0.780 | — | Baseline |
| Naive List Acc | 0.183 | — | Baseline |
| Consensus Gain (vs Single) | 0.224 | > 0.05 | Pass |
| Consensus Gain (vs Naive) | 0.809 | > 0.05 | Pass |
| List Comparisons / Query | 1,188 | Linear scan | Expected |
| VSA Storage (lower bound) | 103,040 bytes | Proxy | Proxy |
| List Storage (lower bound) | 1,600 bytes | Proxy | Proxy |
Interpretation: consensus is much more robust to noise. Here it reaches 0.992 accuracy, while a single column is ~0.769 and the naive list baseline is ~0.183 under the same noisy queries. The large positive “Consensus Gain” values show that voting is doing real work, not just averaging identical answers.
Read left to right: noise goes up, the red line (1 column) drops, while the blue/green lines (5–9 columns) stay near the top.
This fixes the hardest noise level (0.35) and shows how much accuracy you recover compared to a single column. Most of the gain appears by 5 columns.
We use a work proxy rather than a raw runtime claim. For the naive list, the work is “token comparisons per query.” For VSABrains, the proxy is “candidate locations scored per query.” Lower is better for both.
| Work Proxy (noise=0.25, cols=5) | Naive List | VSABrains | Interpretation |
|---|---|---|---|
| Work per query | 1,188 comparisons | 5.67 locations scored | VSABrains does far less work |
| Baseline ÷ VSA work | ~209× | Large win under noisy queries | |
| Index hits per token | — | ~1.14 candidates/token/column | Index stays tight |
The naive list must compare against almost the whole stream, while VSABrains narrows the search to a tiny candidate set via the index.
Even at high noise, the naive scan still does ~100–800× more proxy work than the indexed VSABrains approach. More columns add some work, but the gap remains very large.
| Baseline ÷ VSA Work Ratio | Cols=1 | Cols=5 | Cols=9 |
|---|---|---|---|
| Noise 0.15 | ~856× | ~191× | ~106× |
| Noise 0.25 | ~854× | ~209× | ~115× |
| Noise 0.35 | ~862× | ~226× | ~125× |
Last run: 2026-01-27T12:59:38Z. Noise rate 0.25, window size 6, columns 5, baselineMatchThreshold=6 (exact match).
Sweep run: 2026-01-27T12:59:46Z. See eval/exp4-consensus/sweep.mjs for configuration.
Storage proxy caveat: VSA storage proxy can be larger than the naive list in small configs. The main advantage shown here is robustness plus much lower query work under noise.
As noise increases, 1 column degrades while 5–9 columns remain strong. The gain columns show how much accuracy is recovered by adding columns.
| Noise | Cols=1 Acc | Cols=5 Acc | Cols=5 Gain | Cols=9 Acc | Cols=9 Gain |
|---|---|---|---|---|---|
| 0.15 | 0.939 | 0.997 | +0.146 | 1.000 | +0.151 |
| 0.25 | 0.901 | 0.992 | +0.224 | 1.000 | +0.276 |
| 0.35 | 0.852 | 0.976 | +0.335 | 1.000 | +0.312 |