Architecture Part 1

The Unidirectional Pipeline

From UTF-8 strings to executable bytecode without backtracking.

The Linear Flow

The CNL architecture is built on a strictly linear, one-way data pipeline. This is a deliberate design choice to ensure predictability and performance. There are no feedback loops where the runtime state influences the parser (context-free parsing).

Stage 1: Lexical Analysis (Tokenizer)

The input is a raw string. The Lexer breaks this into a stream of tokens (IDENT, NUMBER, STRING, KEYWORD).

Key Mechanism: Longest Match. Ambiguity often arises when keywords overlap (e.g., "in" vs "in order to"). The lexer always greedily consumes the longest possible keyword. This resolves ambiguity locally without needing parser lookahead.

Stage 2: Syntactic Analysis (Parser)

The Parser consumes tokens and produces an Abstract Syntax Tree (AST). It follows the EBNF grammar strictly.

Key Mechanism: Fail Fast. If the input deviates from the grammar (e.g. missing a determiner in a noun phrase), the parser halts immediately with a specific error. It does not attempt error recovery or fuzzy matching. This ensures that only valid, unambiguous structures proceed down the pipeline.

Stage 3: Validation & Normalization

The Validator walks the AST to check semantic rules that the grammar cannot capture (e.g., "The variable X must be defined before use").

Simultaneously, the AST is Normalized. Syntactic sugar is desugared. For instance, "A > B" might be converted into the canonical "A is greater than B". This simplifies the compiler's job by reducing the number of node types it needs to handle.

Stage 4: Compilation

The Compiler takes the validated AST and the current Dictionary (symbol table) and emits two things:

KB Updates: Direct instructions to insert facts (ground assertions) into the Knowledge Base.
Plan IR: Executable plan objects (RulePlans, ActionPlans) for dynamic behavior.