Epistemic guard that hooks into transformer hidden states to classify whether the model is recalling training data, using context, or confabulating. Exposes a probe layer trained on Fisher geometry (AUROC 0.9944 validated on Qwen2.5) that fires before token commitment. Blocks file writes and code commits when confidence drops below threshold. The symbolic layer tracks uncertain values across inference steps, so if a variable assignment gets flagged as confabulation risk, downstream operations using it stay blocked until verification. Reach for this when you're letting an agent write code or modify files and need circuit-breaker semantics tied to the model's actual epistemic state rather than parsing output text for hedging language.
A model-agnostic inference-time layer that reads transformer hidden states to classify epistemic state before tokens are committed.
from esm import EpistemicProbe
probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt")
probe.register(model) # one call — hooks into forward pass
output = model(input_ids=ids, use_cache=True)
print(probe.last)
# EpistemicState(PARAMETRIC, conf=0.97) → model knows this
# EpistemicState(CONTEXT_DEPENDENT, conf=0.89) → model using context
# EpistemicState(CONFABULATION_RISK, conf=0.74) → model making this up
LLMs cannot tell when they are hallucinating. A model generating a confident, fluent, wrong answer looks identical from the inside to a model generating a confident, fluent, correct answer. Every downstream system — agents, tools, citation engines, medical assistants — inherits this epistemic blindness.
Transformer hidden states at late-middle layers encode a discriminant signal that separates what the model knows from training from what the model is assembling from context or pattern-completing from nothing.
Validated: Fisher geometry AUROC 0.9944 at layer 27 (Qwen2.5-7B, TriviaQA, n=100). The discriminant is latent in the residual stream. It's not in the logits. It's not in attention weights. It's in the hidden state geometry, and it fires before the token is committed.
Three layers of epistemic monitoring:
| Layer | Signal | Status |
|---|---|---|
| Geometric (ESM probe) | Fisher LDA on hidden states | AUROC 0.9944, validated |
| Positional (K-norm) | Mean key-norm per context position | rho +0.794, validated |
| Symbolic (Credence) | Claim-level constraint tracking | FCR study validated |
pip install -e .
from transformers import AutoModelForCausalLM, AutoTokenizer
from esm import EpistemicProbe
import torch
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt")
probe.register(model)
ids = tokenizer("What is the melting point of osmium?", return_tensors="pt").input_ids
with torch.no_grad():
out = model(input_ids=ids, use_cache=True)
print(probe.last)
esm/ — Epistemic Self-Monitoring package (Layer 1 + 2)
credence/ — Symbolic constraint tracking (Layer 3)
evals/
cross_model/ — Cross-model Fisher probe validation (WEEK 1)
t4_results/ — Validated T4 experimental results
checkpoints/ — Trained model artifacts
archive/ — Prior experimental work (CAMS eviction v1, Kaggle P1-P9)
Three questions. Live terminal. Any transformer.
PARAMETRIC 0.97 ✓CONTEXT_DEPENDENT 0.89 ✓CONFABULATION_RISK 0.74 flagged before output