Induction is learning general rules from specific examples. Given positive and negative examples, an induction engine can discover rules that classify the examples correctly. This can help the knowledge base grow from data.
Inductive Logic Programming (ILP):
The learned hypothesis H must entail all positive examples while not entailing any negative examples.
Comparison of Inference Modes:
| Mode | Given | Infer | Example |
|---|---|---|---|
| Deduction | Rule + Case | Result | All birds fly + Tweety is a bird → Tweety flies |
| Abduction | Rule + Result | Case | All birds fly + X flies → X might be a bird |
| Induction | Cases + Results | Rule | Robin flies, Sparrow flies, Eagle flies → Birds fly |
Coverage and Accuracy:
| Metric | Formula | Meaning |
|---|---|---|
| Coverage | TP / (TP + FN) | What fraction of positives are covered? |
| Precision | TP / (TP + FP) | What fraction of predictions are correct? |
| Information Gain | H(D) - H(D|A) | How much does condition A reduce uncertainty? |
| MDL Score | |H| + |D|H|| | Hypothesis size + exceptions (prefer simpler) |
Real-World Challenge: Data often contains noise (mislabeled examples) and exceptions (genuine outliers). Strict ILP fails on noisy data.
// Instead of: H must cover ALL positives and NO negatives
// Allow: H may miss some positives and cover some negatives
Criterion: maximize(Coverage(H) - α × FalsePositives(H) - β × Complexity(H))
// Example: "Birds fly" covers 95% of examples
// Better than: "Birds except penguins, ostriches, emus, kiwis... fly"
| Strategy | Description | When to Use |
|---|---|---|
| Hard constraints | No false positives allowed | Clean data, critical domains |
| Soft constraints | Allow some error for simpler rules | Noisy data, pattern discovery |
| Exception lists | Learn rule + explicit exceptions | Few outliers, interpretability |
| Confidence scores | Attach probability to rules | Uncertain domains |
parent Alice Bob
parent Alice Carol
parent Bob Dave
parent Carol Eve
male Bob
male Dave
female Alice
female Carol
female Eve
Positive Examples:
grandparent Alice Dave ✓
grandparent Alice Eve ✓
Negative Examples:
grandparent Bob Dave ✗
grandparent Alice Bob ✗
Induced Rule:
@r_induced Implies (And (parent ?x ?y) (parent ?y ?z)) (grandparent ?x ?z)
// "X is grandparent of Z if X is parent of Y and Y is parent of Z"
// Coverage: 100%, Precision: 100%
// Induced rules are marked with provenance
@r_learned_001 Implies (isA ?x Bird) (canFly ?x)
; induced_from: examples/birds.sys2
; confidence: 0.95
; coverage: 47/50
; exceptions: [Penguin, Ostrich, Emu]
MDL Principle: The best hypothesis is the one that minimizes the total description length.
H₁: "Birds fly"
L(H₁) = 2 tokens
L(D|H₁) = 3 exceptions = 3 tokens
MDL = 5
H₂: "Birds except penguins, ostriches, emus fly"
L(H₂) = 5 tokens
L(D|H₂) = 0 exceptions = 0 tokens
MDL = 5
// Equal MDL - but H₁ is preferred for generalization
import { InductionEngine } from 'agisystem2/reasoning';
const engine = new InductionEngine(session);
// Learn from examples
const rules = await engine.induce({
positives: [
{ operator: 'canFly', args: ['Robin'] },
{ operator: 'canFly', args: ['Eagle'] }
],
negatives: [
{ operator: 'canFly', args: ['Penguin'] }
],
background: session.kb
});
// With options
const rules = await engine.induce(examples, {
maxRules: 5, // Max rules to learn
minCoverage: 0.8, // Minimum coverage required
maxComplexity: 3, // Max conditions per rule
allowExceptions: true, // Learn exception lists
noiseThreshold: 0.05 // Tolerate 5% noise
});
// Validate learned rules
const validation = await engine.validate(rules, testExamples);