Concepts

Identity, Interning, and Conceptual IDs

From human-readable strings to high-performance dense integers: the lifecycle of an identifier in CNL.

The ConceptualID

In a distributed or long-lived system, identity is hard. Strings are fragile (typos, renaming), and simple auto-incrementing integers are not stable across restarts. CNL-PL introduces the ConceptualID: a globally stable, 64-bit hash-based identifier for every concept in the system.

Whether it is a specific user, a Predicate definition, or an Attribute type, it receives a ConceptualID derived from its canonical definition. This allows the system to be "stateless" in a sense—recompiling the same source files will always produce the same logical relationships, even if the execution order changes. The SymbolTable maintains the bidirectional mapping, ensuring that we can always turn a number back into the string "user" for debugging or explanation.

Dense IDs for Performance

While ConceptualIDs are great for stability, they are terrible for array indexing. A 64-bit hash cannot be used as an index into a JavaScript array or a Bitset. This is where Interning comes in.

When a session starts, the compiler maps every active ConceptualID to a temporary, session-scoped "Dense ID". These are:

EntityID: A dense integer (0, 1, 2...) for every instance.
PredID: An ID for every binary relation type.
AttrID: An ID for every attribute.
UnaryPredID: An ID for sets/types.

These dense IDs are what the KB uses internally. They allow the data structures to be compact arrays (`bits[EntityID]`). The distinction between the stable ConceptualID and the runtime Dense ID is fundamental to the architecture's performance profile.

FactID, RuleID, and Provenance

Beyond simple entities, the system needs to track *truth*. A FactID is a unique identifier for a specific assertion (e.g., "Fact #42: John is an admin"). It is typically a hash of the triplet (SubjectID, PredID, ObjectID).

Similarly, a RuleID identifies a specific logic rule. These IDs are essential for Provenance. When the system explains "Why is John active?", it returns a DAG (Directed Acyclic Graph) of IDs: "John is active (Fact A) because Rule X applied to Fact B and Fact C." Without these unique IDs for facts and rules, the system could calculate the answer but could never explain it.

Finally, the PropID (Proposition ID) is used in the Proof engine to reify complex boolean formulas (like "A OR (B AND NOT C)") into a single addressable ID that can be manipulated by SAT solvers or verification algorithms.