This is a quality assurance tool for MCP server authors. It launches your server over stdio, inventories tools and resources, then runs ten automated checks covering schema validity, naming conventions, privacy mode support, mutation gating, and agent discoverability patterns. You get a 0-100 score with itemized pass/fail/warn results and concrete remediation steps. Use it in CI with the min-score flag to gate deploys, or run it locally during development to catch missing annotations, inconsistent naming, or incomplete manifest tools before agents hit your server in production. The demo output shows it auditing the official reference server and surfacing twelve snake_case violations and missing readOnlyHint annotations.
# Audit a published npm package
npx -y mcp-scorecard whoop-mcp-unofficial
# Audit a GitHub repo (auto-resolves to the published npm package, or local dist)
npx -y mcp-scorecard https://github.com/davidmosiah/whoop-mcp
# Audit a local build
npx -y mcp-scorecard /Users/you/Desktop/my-mcp/dist/index.js
# CI gate: fail the build if the score drops
npx -y mcp-scorecard my-mcp --min-score 80
# Structured JSON for piping into your own tooling
npx -y mcp-scorecard my-mcp --json
Real captured run auditing the official MCP reference server
@modelcontextprotocol/server-everything
— nothing here is hand-edited, this is exactly what the CLI printed:
$ npx -y mcp-scorecard @modelcontextprotocol/server-everything
# mcp-scorecard - @modelcontextprotocol/server-everything @2026.1.26
**Agent-readiness score:** 44/100
- [PASS] Schema validity (13/13 tools have valid input schema)
- [FAIL] Tool naming convention (12/13 tools violate snake_case)
- [FAIL] Privacy modes documented (only 1 tool(s) mention privacy modes)
- [PASS] Mutation gating (no write tools — n/a)
- [FAIL] Agent manifest (no agent_manifest tool)
- [FAIL] Smoke test (no smoke script and no test script)
- [PASS] Resources advertised (7 resources registered)
- [PASS] Tool descriptions (avg 88 chars across 13 tools)
- [FAIL] Annotations (0/13 read tools annotated)
- [FAIL] Manifest discoverability (no discovery tools)
## Details
### Tool naming convention
- Non-snake_case names: get-annotated-message, get-env, get-resource-links, get-resource-reference, get-structured-content, get-sum, get-tiny-image, gzip-file-as-resource, toggle-simulated-logging, toggle-subscriber-updates
### Annotations
- Missing readOnlyHint: echo, get-annotated-message, get-env, get-resource-links, get-resource-reference, get-structured-content, get-sum, get-tiny-image, gzip-file-as-resource, toggle-simulated-logging
## Suggested fixes
- Rename tools to lowercase snake_case (a-z, 0-9, _).
- Add a `privacy_mode` parameter (summary | structured | raw) on read tools so agents can request only what they need.
- Expose a `<prefix>_agent_manifest` tool that returns { recommended_first_calls, standard_tools, ... } so agents can self-onboard.
- Add `scripts/smoke-tools.mjs` that boots the server via StdioClientTransport and asserts the tool list.
- Add `annotations: { readOnlyHint: true, openWorldHint: false }` to every read tool definition.
- Expose discovery tools so agents can self-onboard: `*_agent_manifest`, `*_data_inventory`, `*_capabilities`, `*_connection_status`.
The reference server is a feature showcase, not a production integration — a 44 is expected and is exactly why the conventions checks exist. Agent-oriented servers that adopt snake_case naming, a manifest tool, and read-only annotations land in the 80s and 90s.
Ten quality dimensions, each scored 0-10. Final score is the sum, capped at 100.
inputSchema with ajv. Missing or non-object schemas score zero
per tool.whoop_get_sleep). Mixed case, hyphens, and missing prefixes lose
points.privacy_mode parameter on
any tool, or descriptions that mention summary | structured | raw.(set|update|delete|create| pause|resume|enable|disable|cancel|publish|send) must document a gate
in its description (Gated by ALLOW_MUTATIONS, requires explicit user intent, dry-run, confirm).<prefix>_agent_manifest and checks the
response object has recommended_first_calls (non-empty array) AND
standard_tools (non-empty array). The probe NEVER persists the payload
scripts/smoke*.{mjs,js,ts} in the
package, or a real test script in package.json (not the npm default
echo-and-fail).listResources() returns. Zero
scores zero; one or two scores 5; three or more scores 10.annotations.readOnlyHint = true. Score scales linearly.*_agent_manifest,
*_data_inventory, *_capabilities, *_connection_status. Two or
more scores 10; exactly one scores 7; none scores 0.# mcp-scorecard - whoop-mcp-unofficial @0.4.3
**Agent-readiness score:** 88/100
- [PASS] Schema validity (28/28 tools have valid input schema)
- [PASS] Tool naming convention (consistent `whoop_` prefix, snake_case)
- [PASS] Privacy modes documented (privacy_mode parameter on 6 tool(s))
- [PASS] Mutation gating (no write tools - n/a)
- [PASS] Agent manifest (recommended_first_calls present, 5 entries)
- [PASS] Smoke test (scripts/smoke-tools.mjs found)
- [PASS] Resources advertised (8 resources registered)
- [PASS] Tool descriptions (avg 142 chars across 28 tools)
- [WARN] Annotations (20/28 read tools annotated)
- [PASS] Manifest discoverability (4/4 discovery tools present (agent_manifest, data_inventory, capabilities, connection_status))
## Suggested fixes
- Add `annotations: { readOnlyHint: true, openWorldHint: false }` to every read tool definition.
_Generated by mcp-scorecard v0.1.0 at 2026-05-23T15:42:11.000Z_
--json){
"target": {
"displayName": "whoop-mcp-unofficial",
"version": "0.4.3",
"serverName": "whoop-mcp",
"serverVersion": "0.4.3"
},
"totalScore": 88,
"checks": [
{
"id": "schema_validity",
"label": "Schema validity",
"score": 10,
"status": "pass",
"summary": "28/28 tools have valid input schema",
"details": [],
"fixes": []
}
],
"generatedAt": "2026-05-23T15:42:11.000Z",
"scorecardVersion": "0.1.0"
}
The scorecard launches the target MCP server over stdio with
MCP_PROBE=1 set on the child's env. Author hook: if your MCP needs
OAuth or other credentials to even list tools, detect this env var and
return your tool/resource/prompt manifests anyway. The scorecard expects
to be able to read your contract without making any auth-requiring API
calls.
customer_id, email, phone, access_token,
refresh_token, client_secret, developer_token, and api_key with
[REDACTED].npm pack (when auditing a
package by name) and gh repo clone (when auditing a GitHub URL).npx -y mcp-scorecard my-mcp --min-score 85.prepublishOnly so you never
ship a regressed contract by accident.listTools() returns. If
your server is one of these, support the MCP_PROBE env hook so the
scorecard can still read your contract.prepare_thing instead of set_thing, the check will miss it. This
is intentional: we reward clear naming.listTools() latency), and an OAuth probe with a mock token.This is a quality audit tool. It does not certify security, privacy, or correctness; it only measures whether a server follows agent-friendly conventions. Always do your own review before plugging an MCP into a production agent.
MIT - see LICENSE.
{ "mcpServers": { "mcp-scorecard": { "command": "npx", "args": ["-y", "mcp-scorecard", "serve"] } } }
Then your agent can audit("some-mcp-server") before installing it. An MCP that scores MCPs.npx -y mcp-scorecard https://your-servernpx -y mcp-scorecard my-mcp --badge → a shields.io markdown badge.- uses: davidmosiah/mcp-scorecard@v0.3.0
with:
target: dist/index.js
min-score: 80
mcp-scorecard compare a b c — side-by-side ranking.--profile security|quality|agent-ready — score one category. --baseline old.json — regression diff. --html — shareable scorecard.io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent