Story Quality Metrics
SCRIPTA uses a comprehensive set of metrics to evaluate story quality. These metrics detect structural issues, measure coherence, and ensure generated stories are not random or incoherent.
Why Metrics Matter
AI-generated stories can suffer from various problems:
- Random character assignment - Characters appear once and vanish
- Location hopping - Every scene in a different place with no logic
- Orphan references - Actions mention entities that don't exist
- Incomplete scenes - Missing characters, locations, or actions
- Unused relationships - Defined connections never manifest
- Empty dialogues - Placeholder conversations without purpose
SCRIPTA's metrics system detects these issues automatically, giving you immediate feedback on story quality.
Click "Evaluate" in the Metrics panel after generating a story to see detailed quality analysis. The browser console shows additional debugging information.
Narrative Quality Score (NQS)
The Narrative Quality Score is the master metric that combines all quality dimensions into a single 0-100% score.
NQS Components
| Component | Weight | What It Measures |
|---|---|---|
| Completeness | 12% | Are required elements present? (characters, locations, themes, arc) |
| Coherence Score | 12% | Entity usage and structural coherence |
| Character Drift | 8% | Character trait consistency (lower is better) |
| Originality | 8% | Variety of narrative blocks and actions |
| Emotional Arc | 10% | Arc coverage and mood usage |
| Parse Success | 8% | Valid CNL syntax in output |
| Constraints | 8% | Constraint satisfaction accuracy |
| Explainability | 8% | Documentation level (arc, themes, rules, wisdom) |
| Coherence Analysis | 26% | Character continuity, location logic, action coherence, scene completeness |
- < 40%: Severe issues, likely random/incoherent generation
- 40-60%: Multiple problems, needs significant improvement
- 60-75%: Acceptable, some issues to address
- 75-85%: Good quality, minor refinements needed
- > 85%: Excellent, well-structured story
Summary Metrics
Completeness
Measures whether all required story elements are present:
- At least 2 characters (20%)
- At least 2 locations (10%)
- At least 3 scenes (25%)
- At least 1 theme (10%)
- Emotional arc with 3+ beats (10%)
- At least 2 dialogues (15%)
- At least 2 relationships (10%)
Threshold: >= 80%
Coherence Score (CS)
Measures structural coherence through entity usage and relationships:
- Entity usage ratio (40%) - Are defined characters/locations actually used?
- Relationship coverage (30%) - Are relationships defined between characters?
- Block usage (30%) - Are narrative blocks used in scenes?
Threshold: >= 75%
Emotional Arc Profile (EAP)
Measures how well the emotional journey is defined:
- Arc beat coverage (50%) - How many arc beats are assigned?
- Mood variety (25%) - Are different moods used?
- Mood usage (25%) - Are moods applied to scenes?
Threshold: >= 70%
Detailed Analysis Metrics
Character Attribute Drift (CAD)
Measures trait consistency. Lower is better.
- 3+ traits per character: 0.05 (excellent)
- 2 traits per character: 0.10 (good)
- 1 trait per character: 0.15 (minimal)
- 0 traits: 0.25 (poor)
Threshold: <= 0.15
Compliance Adherence Rate (CAR)
Percentage of references that point to existing entities. Orphan references (pointing to non-existent entities) lower this score.
Threshold: >= 95%
Originality Index (OI)
Measures variety in the narrative:
- Block type variety (40%) - Different narrative blocks used
- Action variety (30%) - Different action types used
- Theme variety (30%) - Multiple themes defined
Threshold: >= 50%
CNL Parse Success Rate (CPSR)
Percentage of CNL lines that follow valid syntax. Tests whether the specification can be parsed correctly.
Threshold: >= 90%
Constraint Satisfaction Accuracy (CSA)
Ratio of valid references to total references. Similar to CAR but focused on constraint validation.
Threshold: >= 95%
Retrieval Quality (RQ)
Measures naming quality - entities should have meaningful names (3+ characters, not "undefined").
Threshold: >= 80%
Explainability Score
Measures documentation level (0-5 rating):
- Arc defined: +0.75
- Themes defined: +0.75
- Relationships defined: +0.75
- World rules defined: +0.75
- Emotional arc (3+ beats): +0.75
- Wisdom entries: +0.75
- Patterns defined: +0.50
Threshold: >= 3.5/5
Coherence Analysis
These six metrics specifically detect random, inconsistent, or incoherent generation. They are the primary defense against "garbage" output.
If coherence metrics average below 40%, the story is likely random noise rather than meaningful narrative. Check the console for detailed diagnostics.
Character Continuity
Question: Do characters appear consistently across scenes?
- Characters in multiple scenes (50%)
- Hero/protagonist presence throughout (50%)
Threshold: >= 60%
Low score indicates: Characters randomly assigned per scene, no continuity
Location Logic
Question: Are locations reused logically or is it random hopping?
- Locations appearing 2+ times (50%)
- Average location usage vs expected (50%)
Threshold: >= 50%
Low score indicates: Every scene in a different location, no spatial coherence
Action Coherence
Question: Do actions reference known entities?
- Action subject is a known character
- Action target is a known character, location, or object
Threshold: >= 80%
Low score indicates: Actions reference entities that don't exist in libraries
Scene Completeness
Question: Does each scene have minimum required elements?
A complete scene needs:
- At least one character
- At least one location
- At least one action or dialogue
Threshold: >= 70%
Low score indicates: Scenes are fragments, missing essential elements
Relationship Usage
Question: Are defined relationships actually used in the story?
- For each relationship (A -> B), do A and B appear in the same scene?
- Unused relationships are just decoration
Threshold: >= 50%
Low score indicates: Relationships defined but never manifested in narrative
Dialogue Quality
Question: Are dialogues well-structured and purposeful?
Each dialogue scores 0-1 based on:
- Has purpose defined (+0.25)
- Has 2+ participants (+0.25)
- Has at least 1 exchange (+0.25)
- Exchanges have content (intent or sketch) (+0.25)
Threshold: >= 60%
Low score indicates: Dialogues are empty placeholders without structure
Diagnostic Interpretation
Overall Coherence Assessment
| Average Score | Interpretation | Action |
|---|---|---|
| < 30% | Garbage - Random/incoherent | Regenerate with better specification |
| 30-50% | Poor - Major structural issues | Review entities and structure |
| 50-70% | Fair - Some coherence | Refine weak areas |
| 70-85% | Good - Mostly coherent | Minor adjustments |
| > 85% | Excellent - Well-structured | Ready for expansion |
Specific Issue Detection
| Low Metric | Likely Problem | Solution |
|---|---|---|
| Character Continuity < 40% | Characters randomly assigned | Ensure hero appears in most scenes |
| Location Logic < 30% | Random scene hopping | Reuse key locations (home, workplace) |
| Action Coherence < 60% | Orphan entity references | Add missing characters/objects to libraries |
| Scene Completeness < 50% | Incomplete scene fragments | Ensure each scene has who/where/what |
| Relationship Usage < 30% | Relationships never used | Place related characters in same scenes |
| Dialogue Quality < 40% | Empty placeholder dialogues | Add purpose and exchanges to dialogues |
Console Debugging
Open browser developer tools (F12) to see detailed metric calculations:
[Evaluate] Counts: {
characters: 4,
locations: 3,
scenes: 8,
chapters: 3,
actions: 12,
dialogues: 5,
themes: 2,
relationships: 6,
worldRules: 3,
wisdom: 2,
patterns: 1
}
[Evaluate] Coherence: {
charContinuity: "0.75",
locLogic: "0.60",
actionCoherence: "0.92",
sceneCompleteness: "0.88",
relUsage: "0.67",
dialogueQuality: "0.70"
}
Use this data to identify exactly which aspects need improvement.
Related Documentation
Technical Specifications
- DS12 - Metrics Interpreter Overview
- DS13 - Coherence Score (CS) Definition
- DS14 - Character Attribute Drift (CAD)
- DS15 - Compliance Adherence Rate (CAR)
- DS16 - Originality Index (OI)
- DS17 - Emotional Arc Profile (EAP)
- DS18 - Retrieval Quality (RQ)
- DS20 - CNL Parse Success Rate (CPSR)
- DS21 - Constraint Satisfaction Accuracy (CSA)
- DS22 - Explainability Score
- DS23 - Narrative Quality Score (NQS)
- DS25 - Coherence Analysis Metrics (NEW)
Implementation
See demo/app/metrics.mjs for the complete implementation of all metrics calculations.