CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

InferMap

benseverndev-oss/infermap
STDIO, HTTPregistry active
Summary

Solves the "which CSV column maps to which schema field" problem using seven different scoring strategies: exact name match, known aliases, initialism detection, pattern recognition for emails/UUIDs/dates, statistical profiling, fuzzy string matching, and optional LLM scoring. Combines them with the Hungarian algorithm for optimal one-to-one assignment, calibrates confidence scores, and spits out mappings with explainable reasoning. Useful when you're ingesting messy customer exports, vendor CSVs, or legacy database dumps and need to programmatically align them to your canonical schema without writing brittle manual rules. The MCP exposes map operations over stdio or HTTP, handling DataFrames, CSVs, databases, or in-memory records. Python and TypeScript implementations share a golden test suite for bit-identical decisions. Originally built by Ben Severn, now maintained in the goldenmatch monorepo.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

Moved. This repo has moved into the benzsevern/goldenmatch monorepo at packages/python/infermap (and packages/typescript/infermap)/. This repo is archived; new development happens in the monorepo.

infermap

Inference-driven schema mapping engine.
Map messy source columns to a known target schema — accurately, explainably, with zero config.
Built by Ben Severn.

PyPI npm PyPI downloads npm downloads CI

Python 3.11+ Node 20+ TypeScript Edge runtime Parity License: MIT

📖 Wiki · 🌐 Docs · 🧪 Examples · 💬 Discussions · 🐛 Issues


infermap is a schema-mapping engine. Give it any two field collections (CSVs, DataFrames, database tables, in-memory records) and it figures out which source field corresponds to which target field, with confidence scores and human-readable reasoning. Available as a Python package on PyPI and a TypeScript package on npm, with mapping decisions verified bit-for-bit by a shared golden-test parity suite.

Table of contents

  • Install
  • Quick start
  • How it works
  • Features
  • Which package should I use?
  • Custom scorers
  • CLI examples
  • Config reference
  • Documentation
  • License

Install

Python

pip install infermap

Optional database extras:

pip install infermap[postgres]   # psycopg2-binary
pip install infermap[mysql]      # mysql-connector-python
pip install infermap[duckdb]     # duckdb
pip install infermap[all]        # all extras

TypeScript / Next.js

npm install infermap

Zero runtime dependencies in the core entrypoint. Compatible with Next.js Server Components, Route Handlers, Server Actions, and the Edge Runtime out of the box. See the package README for the full reference.

Quick start

Python

import infermap

# Map a CRM export CSV to a canonical customer schema
result = infermap.map("crm_export.csv", "canonical_customers.csv")

for m in result.mappings:
    print(f"{m.source} -> {m.target}  ({m.confidence:.0%})")
# fname -> first_name  (97%)
# lname -> last_name   (95%)
# email_addr -> email  (91%)

# Apply mappings to rename DataFrame columns
import polars as pl
df = pl.read_csv("crm_export.csv")
renamed = result.apply(df)

# Save mappings to a reusable config file
result.to_config("my_mapping.yaml")

# Reload later — no re-inference needed
saved = infermap.from_config("my_mapping.yaml")

TypeScript

import { map } from "infermap";

const crm = [
  { fname: "John", lname: "Doe", email_addr: "j@d.co" },
  { fname: "Jane", lname: "Smith", email_addr: "j@s.co" },
];

const canonical = [
  { first_name: "", last_name: "", email: "" },
];

const result = map({ records: crm }, { records: canonical });

for (const m of result.mappings) {
  console.log(`${m.source} → ${m.target}  (${m.confidence.toFixed(2)})`);
}
// fname       → first_name  (0.44)
// lname       → last_name   (0.48)
// email_addr  → email       (0.69)

For Next.js, drop it directly into a Route Handler — works on Edge Runtime with zero config:

// app/api/infer/route.ts
import { map } from "infermap";
export const runtime = "edge";

export async function POST(req: Request) {
  const { sourceCsv, targetCsv } = await req.json();
  const result = map({ csvText: sourceCsv }, { csvText: targetCsv });
  return Response.json(result);
}

How it works

Each field pair runs through a pipeline of 7 scorers. Each scorer returns a score in [0.0, 1.0] or abstains (None/null). The engine combines scores via weighted average (requiring at least 2 contributors), then uses the Hungarian algorithm for optimal one-to-one assignment.

ScorerWeightWhat it detects
ExactScorer1.0Case-insensitive exact name match
AliasScorer0.95Known field aliases (fname ↔ first_name, tel ↔ phone) + domain dictionaries
InitialismScorer0.75Abbreviation-style names (assay_id ↔ ASSI, confidence_score ↔ CONSC)
PatternTypeScorer0.7Semantic type from sample values — email, date_iso, phone, uuid, url, zip, currency
ProfileScorer0.5Statistical profile similarity — dtype, null rate, unique rate, length, cardinality
FuzzyNameScorer0.4Jaro-Winkler similarity on normalized field names (with common-prefix canonicalization)
LLMScorer0.8Pluggable LLM-backed scorer (stubbed by default)

The engine also applies common-prefix canonicalization — automatically stripping schema-wide prefixes like prospect_ so that City vs prospect_City is compared as City vs City. And optional confidence calibration transforms raw scores into calibrated probabilities post-assignment (ECE from 0.46 to 0.005 on real-world data).

Read the full architecture →

Features

PythonTypeScript
7 built-in scorers✅✅
Hungarian assignment✅ (scipy)✅ (vendored)
Custom scorers@infermap.scorerdefineScorer()
Domain dictionaries✅ (YAML)✅ (inlined)
Confidence calibration✅ (Identity/Isotonic/Platt)✅
Score matrix inspection✅✅
In-memory dataPolars, Pandas, list[dict]Array<Record>
File providersCSV, Parquet, XLSXCSV, JSON
Schema definition filesYAML + JSONJSON
Database providersSQLite, Postgres, DuckDBSQLite, Postgres, DuckDB
Engine configYAMLJSON
Saved mapping formatYAMLJSON
CLI✅ (Typer)✅ (node:util)
Apply to DataFrame✅❌ (CSV rewrite via CLI)
Edge-runtime compatible❌✅
Zero runtime depsn/a✅
Accuracy benchmark✅ (162 cases, F1 0.84)✅ (parity within 0.0005)

Full feature parity matrix →

Which package should I use?

If you are…Use
Building a Python data pipeline or notebookPython
Building a Next.js app, Node service, or browser toolTypeScript
Running mapping in a serverless edge functionTypeScript (zero Node built-ins)
Doing ad-hoc CSV exploration on the command linePython CLI has more features; TS CLI is leaner
Both — Python backend + Next.js admin UIBoth — outputs are interoperable via the JSON config format

What's new in v0.3

+18.3pp F1 on real-world data from four compounding improvements:

v0.2 baseline    F1 0.657
+ min_conf 0.2   F1 0.765  (+10.8pp — empirically tuned threshold)
+ prefix-strip   F1 0.821  (+5.6pp  — City vs prospect_City now works)
+ InitialismScorer F1 0.840 (+1.9pp  — ASSI, CONSC, RELATIT now work)

New features:

  • Domain dictionaries — MapEngine(domains=["healthcare"]) loads curated aliases for your domain. Ships: generic (default), healthcare, finance, ecommerce. See examples/09_domain_dictionaries.py.
  • Confidence calibration — MapEngine(calibrator=cal) transforms raw scores into calibrated probabilities. Ships: IsotonicCalibrator, PlattCalibrator. Valentine ECE: 0.46 → 0.005. See examples/10_calibration.py.
  • InitialismScorer — matches abbreviation-style column names (assay_id ↔ ASSI). ChEMBL F1: 0.524 → 0.819.
  • Common-prefix canonicalization — automatically strips prospect_, assays_, etc. before fuzzy matching.
  • Valentine corpus — 82 real-world schema-matching cases from the Valentine benchmark for accuracy testing.
  • Full TypeScript parity — all new features ported. 186 TS tests. Benchmark F1 within 0.0005 of Python.

Custom scorers

Python

import infermap
from infermap.types import FieldInfo, ScorerResult

@infermap.scorer("prefix_scorer", weight=0.8)
def prefix_scorer(source: FieldInfo, target: FieldInfo) -> ScorerResult | None:
    if source.name[:3].lower() != target.name[:3].lower():
        return None
    return ScorerResult(score=0.85, reasoning=f"Shared prefix '{source.name[:3]}'")

from infermap.engine import MapEngine
from infermap.scorers import default_scorers

engine = MapEngine(scorers=[*default_scorers(), prefix_scorer])

TypeScript

import { MapEngine, defaultScorers, defineScorer, makeScorerResult } from "infermap";

const prefixScorer = defineScorer(
  "prefix_scorer",
  (source, target) => {
    if (source.name.slice(0, 3).toLowerCase() !== target.name.slice(0, 3).toLowerCase()) {
      return null;
    }
    return makeScorerResult(0.85, `Shared prefix '${source.name.slice(0, 3)}'`);
  },
  0.8 // weight
);

const engine = new MapEngine({
  scorers: [...defaultScorers(), prefixScorer],
});

CLI examples

The CLI works the same way in both packages:

# Map two files and print a report
infermap map crm_export.csv canonical_customers.csv

# Map and save the config (Python: --save, TS: -o)
infermap map crm_export.csv canonical_customers.csv -o mapping.json

# Apply a saved mapping to rename columns
infermap apply crm_export.csv --config mapping.json --output renamed.csv

# Inspect the schema of a file or DB table
infermap inspect crm_export.csv
infermap inspect "sqlite:///mydb.db" --table customers

# Validate a saved config against a source
infermap validate crm_export.csv --config mapping.json --required email,id --strict

Config reference

Both packages accept an engine config (scorer weight overrides + alias extensions). Python uses YAML, TypeScript uses JSON; the shape is identical.

# Python: infermap.yaml
domains:
  - healthcare
  - finance
scorers:
  LLMScorer:
    enabled: false
  FuzzyNameScorer:
    weight: 0.3
aliases:
  order_id:
    - order_num
    - ord_no
// TypeScript: infermap.config.json
{
  "scorers": {
    "LLMScorer":       { "enabled": false },
    "FuzzyNameScorer": { "weight": 0.3 }
  },
  "aliases": {
    "order_id": ["order_num", "ord_no"]
  }
}

See infermap.yaml.example for a full annotated reference.

Documentation

  • 📖 Wiki — full reference for both languages
    • Getting Started
    • Python API
    • TypeScript API
    • Python vs TypeScript — migration guide
    • Scorers
    • Architecture
    • FAQ
  • 🌐 Documentation site
  • 🧪 Examples
    • Python examples — 10 numbered scripts covering basic mapping, databases, custom scorers, config, domain dictionaries, calibration, and score-matrix introspection
    • TypeScript examples — basic mapping, Next.js Edge Runtime, custom scorer, databases, domain dictionaries, save/reuse
  • 📓 Open in Colab — Python notebook
  • 💬 GitHub Discussions
  • 🐛 Issue tracker

Author

Ben Severn

License

MIT

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Registryactive
Packageinfermap
TransportSTDIO, HTTP
UpdatedJun 1, 2026
View on GitHub