Evaluation Status
Current validation state against DS003 criteria.
Summary
Full Success
Last full run: 2026-01-26T20:45:03Z. DS003 success criteria are met under the current evaluation configuration.
Recent updates: Exp4 consensus, work proxies, and compression sweep were refreshed on 2026-01-27.
Evaluation configuration notes: Exp2 time localization is measured via replay-based window verification against fast maps (not through a separate learned time model). Exp3 uses a deterministic extractor stub (no external LLM), so extractor stability and latency under real LLMs are not yet validated.
Exp6 uses a simulated semantic encoder to stress the frame/query pipeline for literature and dialogue.
Experiments
Click an experiment to load its dedicated page with context, interpretation, and visuals.
Loading experiment…
Known Gaps
| Area | Gap | Impact |
|---|---|---|
| Exp3 Extraction | Extractor is deterministic stub, not a live LLM. | LLM variance, latency, and error rates not validated. |
| Exp2 Time Localization | Measured via replay verification, not a learned or independent time model. | Success reflects deterministic verification, not predictive time inference. |
| Compression Sweep | Compression sweep now covers grid size and heavy-hitters k, but only on Exp4 proxies. |
Compression effects on Exp1/Exp2 and real memory usage remain unvalidated. |
| Exp6 Encoder | Semantic encoder is simulated; not a production‑grade model. | Frame quality reflects the generator, not a live LLM pipeline. |