Story Quality Metrics

SCRIPTA uses a comprehensive set of metrics to evaluate story quality. These metrics detect structural issues, measure coherence, and ensure generated stories are not random or incoherent.

Why Metrics Matter

AI-generated stories can suffer from various problems:

  • Random character assignment - Characters appear once and vanish
  • Location hopping - Every scene in a different place with no logic
  • Orphan references - Actions mention entities that don't exist
  • Incomplete scenes - Missing characters, locations, or actions
  • Unused relationships - Defined connections never manifest
  • Empty dialogues - Placeholder conversations without purpose

SCRIPTA's metrics system detects these issues automatically, giving you immediate feedback on story quality.

The Evaluate Button

Click "Evaluate" in the Metrics panel after generating a story to see detailed quality analysis. The browser console shows additional debugging information.

Narrative Quality Score (NQS)

The Narrative Quality Score is the master metric that combines all quality dimensions into a single 0-100% score.

NQS Components

Component Weight What It Measures
Completeness 12% Are required elements present? (characters, locations, themes, arc)
Coherence Score 12% Entity usage and structural coherence
Character Drift 8% Character trait consistency (lower is better)
Originality 8% Variety of narrative blocks and actions
Emotional Arc 10% Arc coverage and mood usage
Parse Success 8% Valid CNL syntax in output
Constraints 8% Constraint satisfaction accuracy
Explainability 8% Documentation level (arc, themes, rules, wisdom)
Coherence Analysis 26% Character continuity, location logic, action coherence, scene completeness
Interpreting NQS
  • < 40%: Severe issues, likely random/incoherent generation
  • 40-60%: Multiple problems, needs significant improvement
  • 60-75%: Acceptable, some issues to address
  • 75-85%: Good quality, minor refinements needed
  • > 85%: Excellent, well-structured story

Summary Metrics

Completeness

Measures whether all required story elements are present:

  • At least 2 characters (20%)
  • At least 2 locations (10%)
  • At least 3 scenes (25%)
  • At least 1 theme (10%)
  • Emotional arc with 3+ beats (10%)
  • At least 2 dialogues (15%)
  • At least 2 relationships (10%)

Threshold: >= 80%

Coherence Score (CS)

Measures structural coherence through entity usage and relationships:

  • Entity usage ratio (40%) - Are defined characters/locations actually used?
  • Relationship coverage (30%) - Are relationships defined between characters?
  • Block usage (30%) - Are narrative blocks used in scenes?

Threshold: >= 75%

Emotional Arc Profile (EAP)

Measures how well the emotional journey is defined:

  • Arc beat coverage (50%) - How many arc beats are assigned?
  • Mood variety (25%) - Are different moods used?
  • Mood usage (25%) - Are moods applied to scenes?

Threshold: >= 70%

Detailed Analysis Metrics

Character Attribute Drift (CAD)

Measures trait consistency. Lower is better.

  • 3+ traits per character: 0.05 (excellent)
  • 2 traits per character: 0.10 (good)
  • 1 trait per character: 0.15 (minimal)
  • 0 traits: 0.25 (poor)

Threshold: <= 0.15

Compliance Adherence Rate (CAR)

Percentage of references that point to existing entities. Orphan references (pointing to non-existent entities) lower this score.

Threshold: >= 95%

Originality Index (OI)

Measures variety in the narrative:

  • Block type variety (40%) - Different narrative blocks used
  • Action variety (30%) - Different action types used
  • Theme variety (30%) - Multiple themes defined

Threshold: >= 50%

CNL Parse Success Rate (CPSR)

Percentage of CNL lines that follow valid syntax. Tests whether the specification can be parsed correctly.

Threshold: >= 90%

Constraint Satisfaction Accuracy (CSA)

Ratio of valid references to total references. Similar to CAR but focused on constraint validation.

Threshold: >= 95%

Retrieval Quality (RQ)

Measures naming quality - entities should have meaningful names (3+ characters, not "undefined").

Threshold: >= 80%

Explainability Score

Measures documentation level (0-5 rating):

  • Arc defined: +0.75
  • Themes defined: +0.75
  • Relationships defined: +0.75
  • World rules defined: +0.75
  • Emotional arc (3+ beats): +0.75
  • Wisdom entries: +0.75
  • Patterns defined: +0.50

Threshold: >= 3.5/5

Coherence Analysis

These six metrics specifically detect random, inconsistent, or incoherent generation. They are the primary defense against "garbage" output.

Detecting Bad Generation

If coherence metrics average below 40%, the story is likely random noise rather than meaningful narrative. Check the console for detailed diagnostics.

Character Continuity

Question: Do characters appear consistently across scenes?

  • Characters in multiple scenes (50%)
  • Hero/protagonist presence throughout (50%)

Threshold: >= 60%

Low score indicates: Characters randomly assigned per scene, no continuity

Location Logic

Question: Are locations reused logically or is it random hopping?

  • Locations appearing 2+ times (50%)
  • Average location usage vs expected (50%)

Threshold: >= 50%

Low score indicates: Every scene in a different location, no spatial coherence

Action Coherence

Question: Do actions reference known entities?

  • Action subject is a known character
  • Action target is a known character, location, or object

Threshold: >= 80%

Low score indicates: Actions reference entities that don't exist in libraries

Scene Completeness

Question: Does each scene have minimum required elements?

A complete scene needs:

  • At least one character
  • At least one location
  • At least one action or dialogue

Threshold: >= 70%

Low score indicates: Scenes are fragments, missing essential elements

Relationship Usage

Question: Are defined relationships actually used in the story?

  • For each relationship (A -> B), do A and B appear in the same scene?
  • Unused relationships are just decoration

Threshold: >= 50%

Low score indicates: Relationships defined but never manifested in narrative

Dialogue Quality

Question: Are dialogues well-structured and purposeful?

Each dialogue scores 0-1 based on:

  • Has purpose defined (+0.25)
  • Has 2+ participants (+0.25)
  • Has at least 1 exchange (+0.25)
  • Exchanges have content (intent or sketch) (+0.25)

Threshold: >= 60%

Low score indicates: Dialogues are empty placeholders without structure

Diagnostic Interpretation

Overall Coherence Assessment

Average Score Interpretation Action
< 30% Garbage - Random/incoherent Regenerate with better specification
30-50% Poor - Major structural issues Review entities and structure
50-70% Fair - Some coherence Refine weak areas
70-85% Good - Mostly coherent Minor adjustments
> 85% Excellent - Well-structured Ready for expansion

Specific Issue Detection

Low Metric Likely Problem Solution
Character Continuity < 40% Characters randomly assigned Ensure hero appears in most scenes
Location Logic < 30% Random scene hopping Reuse key locations (home, workplace)
Action Coherence < 60% Orphan entity references Add missing characters/objects to libraries
Scene Completeness < 50% Incomplete scene fragments Ensure each scene has who/where/what
Relationship Usage < 30% Relationships never used Place related characters in same scenes
Dialogue Quality < 40% Empty placeholder dialogues Add purpose and exchanges to dialogues

Console Debugging

Open browser developer tools (F12) to see detailed metric calculations:

Console Output Example
[Evaluate] Counts: {
  characters: 4,
  locations: 3,
  scenes: 8,
  chapters: 3,
  actions: 12,
  dialogues: 5,
  themes: 2,
  relationships: 6,
  worldRules: 3,
  wisdom: 2,
  patterns: 1
}

[Evaluate] Coherence: {
  charContinuity: "0.75",
  locLogic: "0.60",
  actionCoherence: "0.92",
  sceneCompleteness: "0.88",
  relUsage: "0.67",
  dialogueQuality: "0.70"
}

Use this data to identify exactly which aspects need improvement.

Related Documentation

Technical Specifications

  • DS12 - Metrics Interpreter Overview
  • DS13 - Coherence Score (CS) Definition
  • DS14 - Character Attribute Drift (CAD)
  • DS15 - Compliance Adherence Rate (CAR)
  • DS16 - Originality Index (OI)
  • DS17 - Emotional Arc Profile (EAP)
  • DS18 - Retrieval Quality (RQ)
  • DS20 - CNL Parse Success Rate (CPSR)
  • DS21 - Constraint Satisfaction Accuracy (CSA)
  • DS22 - Explainability Score
  • DS23 - Narrative Quality Score (NQS)
  • DS25 - Coherence Analysis Metrics (NEW)

Implementation

See demo/app/metrics.mjs for the complete implementation of all metrics calculations.