Credence — Epistemic Guard

lakshmi-chakradhar-vijayarao/credence-ai

STDIOregistry active

Summary

Epistemic guard that hooks into transformer hidden states to classify whether the model is recalling training data, using context, or confabulating. Exposes a probe layer trained on Fisher geometry (AUROC 0.9944 validated on Qwen2.5) that fires before token commitment. Blocks file writes and code commits when confidence drops below threshold. The symbolic layer tracks uncertain values across inference steps, so if a variable assignment gets flagged as confabulation risk, downstream operations using it stay blocked until verification. Reach for this when you're letting an agent write code or modify files and need circuit-breaker semantics tied to the model's actual epistemic state rather than parsing output text for hedging language.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Epistemic Self-Monitoring (ESM)

A model-agnostic inference-time layer that reads transformer hidden states to classify epistemic state before tokens are committed.

from esm import EpistemicProbe

probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt")
probe.register(model)  # one call — hooks into forward pass

output = model(input_ids=ids, use_cache=True)
print(probe.last)
# EpistemicState(PARAMETRIC, conf=0.97)        → model knows this
# EpistemicState(CONTEXT_DEPENDENT, conf=0.89) → model using context
# EpistemicState(CONFABULATION_RISK, conf=0.74) → model making this up

The Problem

LLMs cannot tell when they are hallucinating. A model generating a confident, fluent, wrong answer looks identical from the inside to a model generating a confident, fluent, correct answer. Every downstream system — agents, tools, citation engines, medical assistants — inherits this epistemic blindness.

The Signal

Transformer hidden states at late-middle layers encode a discriminant signal that separates what the model knows from training from what the model is assembling from context or pattern-completing from nothing.

Validated: Fisher geometry AUROC 0.9944 at layer 27 (Qwen2.5-7B, TriviaQA, n=100). The discriminant is latent in the residual stream. It's not in the logits. It's not in attention weights. It's in the hidden state geometry, and it fires before the token is committed.

The Architecture

Three layers of epistemic monitoring:

Layer	Signal	Status
Geometric (ESM probe)	Fisher LDA on hidden states	AUROC 0.9944, validated
Positional (K-norm)	Mean key-norm per context position	rho +0.794, validated
Symbolic (Credence)	Claim-level constraint tracking	FCR study validated

Installation

pip install -e .

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from esm import EpistemicProbe
import torch

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt")
probe.register(model)

ids = tokenizer("What is the melting point of osmium?", return_tensors="pt").input_ids
with torch.no_grad():
    out = model(input_ids=ids, use_cache=True)

print(probe.last)

Status

Fisher geometry validated (AUROC 0.9944, Qwen2.5-7B)
K-norm signal validated (rho +0.794)
EQL head trained (AUROC 0.736, undertrained)
Credence symbolic layer working (FCR validated)
Cross-model validation (Week 1 — in progress)
Hallucination-labeled training data (Week 3)
Production demo (Week 5)

Repository Structure

esm/           — Epistemic Self-Monitoring package (Layer 1 + 2)
credence/      — Symbolic constraint tracking (Layer 3)
evals/
  cross_model/ — Cross-model Fisher probe validation (WEEK 1)
  t4_results/  — Validated T4 experimental results
checkpoints/   — Trained model artifacts
archive/       — Prior experimental work (CAMS eviction v1, Kaggle P1-P9)

The Demo

Three questions. Live terminal. Any transformer.

Known fact → PARAMETRIC 0.97 ✓
Document Q&A → CONTEXT_DEPENDENT 0.89 ✓
Plausible fabrication → CONFABULATION_RISK 0.74 flagged before output

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Registryactive

Packagecredence-guard

TransportSTDIO

UpdatedMay 6, 2026

View on GitHub

Epistemic Self-Monitoring (ESM)

A model-agnostic inference-time layer that reads transformer hidden states to classify epistemic state before tokens are committed.

from esm import EpistemicProbe probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt") probe.register(model) # one call — hooks into forward pass output = model(input_ids=ids, use_cache=True) print(probe.last) # EpistemicState(PARAMETRIC, conf=0.97) → model knows this # EpistemicState(CONTEXT_DEPENDENT, conf=0.89) → model using context # EpistemicState(CONFABULATION_RISK, conf=0.74) → model making this up

The Problem

The Signal

Layer

Signal

Status

Geometric (ESM probe)

Fisher LDA on hidden states

AUROC 0.9944, validated

Positional (K-norm)

Mean key-norm per context position

rho +0.794, validated

Symbolic (Credence)

Claim-level constraint tracking

FCR study validated

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer from esm import EpistemicProbe import torch model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") probe = EpistemicProbe.from_checkpoint("checkpoints/eql_head_best.pt") probe.register(model) ids = tokenizer("What is the melting point of osmium?", return_tensors="pt").input_ids with torch.no_grad(): out = model(input_ids=ids, use_cache=True) print(probe.last)

Status

Fisher geometry validated (AUROC 0.9944, Qwen2.5-7B)

K-norm signal validated (rho +0.794)

EQL head trained (AUROC 0.736, undertrained)

Credence symbolic layer working (FCR validated)

Cross-model validation (Week 1 — in progress)

Hallucination-labeled training data (Week 3)

Production demo (Week 5)

Repository Structure

esm/ — Epistemic Self-Monitoring package (Layer 1 + 2) credence/ — Symbolic constraint tracking (Layer 3) evals/ cross_model/ — Cross-model Fisher probe validation (WEEK 1) t4_results/ — Validated T4 experimental results checkpoints/ — Trained model artifacts archive/ — Prior experimental work (CAMS eviction v1, Kaggle P1-P9)