CAT
/MCP
SkillsMCPMarketplacesDigestToolsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Cross AI Tools

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Tools
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Test Genie

muse-code-space/test-genie-mcp
STDIOregistry active
Summary

Connects Claude to test-genie's static analysis and self-healing test automation toolchain. Exposes 23 tools including diagnose_project for whole-project vulnerability and logic scanning (race conditions, hardcoded secrets, SQL injection, memory leaks), run_iterative_fix_loop for test-driven fix application with automatic rollback on regression, and platform-specific analyzers for iOS, Android, Flutter, React Native, and web. The diagnose_project call runs in about 30 seconds and returns prioritized findings with confidence scores and estimated fix times. The autoFix flag handles mechanical replacements like weak hash upgrades, while the full iterate-fix loop does syntax validation, backup, test rerun, and rollback. Built for catching the issues that slip past linters but break production.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →

test-genie-mcp

Built for vibe coders: one command, get a prioritized list of what's actually broken about your project.

Self-healing test automation for iOS, Android, Flutter, React Native and Web apps — as an MCP server.

npm version CI License: MIT MCP

v3.1.1 — vibe-check + honest auto-fix. One MCP call, ~30 seconds: race conditions + security issues + memory leaks + logic errors + perf smells, prioritized. Stays on your machine, no telemetry. Pass autoFix: true for the small, safe mechanical fixes (weak-hash, simple Math.random assignment) — backup + syntax-validate + rollback-on-syntax-fail. For test-verified application of harder fixes, use v3.0.0's iterate-fix loop.


Vibe coders quickstart

You don't read the docs. You open the project, talk to Claude, and want a verdict. Here it is:

In Claude (with test-genie-mcp installed — setup):

/vibe-check /Users/me/my-app

Claude calls diagnose_project under the hood. ~30 seconds later you see:

# vibe-check report

- Project: /Users/me/my-app
- Platform: web
- Findings: 11 total — 4 critical, 4 high, 1 medium, 1 low
- Estimated fix time: ~85 min

## Top 5 issues

### 1. [CRIT] Hardcoded AWS access key id found in source
- File: `server.js:7`
- Category: security / secret (CWE-798)
- Confidence: 95%
- Fix: Move the value to an env var, gitignore the config, rotate the leaked key.

### 2. [CRIT] SQL string built by concatenating user input
- File: `server.js:21`
- Category: security / injection (CWE-89)
- Fix: Use parameterized queries (`db.query("... WHERE id = ?", [id])`).

### 3. [HIGH] useState setter called after await without mount guard
- File: `UserProfile.tsx:16`
- Category: race-condition / react-setstate-after-await (CWE-362)
- Confidence: 78%
- Fix: Use AbortController and check signal.aborted before calling setters.

… (top 5 shown — full list at output: "detailed")

## Next steps
1. Address the critical / high findings above.
2. Re-run diagnose_project after fixing to confirm convergence.
3. Use run_iterative_fix_loop for test-driven verification of each fix.

If any finding is autoFixable: true and is at high/critical severity, the diagnose_project call accepts autoFix: true to apply the mechanical replacement directly (with backup + syntax validation — see SAFETY.md for the exact guards). The v3.1.1 honest scope is narrow: weak hash (createHash('md5'|'sha1') → createHash('sha256')) and standalone Math.random() in security-sensitive files. For broader/structural fixes (race conditions, eval, exec injection) run run_iterative_fix_loop separately — it re-runs tests and auto-rolls-back on regression.


Why test-genie?

The bottleneck in mobile + cross-platform test automation isn't writing tests — it's the loop between a failing test and a passing test. test-genie closes that loop:

failing test → analyzer flags issue → fix proposed → dry-run + syntax check →
applied with backup → affected tests re-run → regression check → loop or stop

This full loop is the run_iterative_fix_loop tool. The diagnose_project autoFix: true path in v3.1.1 covers a strict subset — backup + dry-run + syntax-validate + apply, without re-running tests (so no test-regression rollback in that path). Use the right tool for the job — and see SAFETY.md for the exact guards on each.

Other tools (Detox, Maestro, Playwright, xcodebuild test) run tests. test-genie runs tests and drives the fix until the bar is met or it can no longer make progress — without you scrubbing through stack traces.


5-minute Quickstart

# 1. Install
npm install -g test-genie-mcp

# 2. Add to Claude Desktop config (~/.config/claude/claude_desktop_config.json)
{
  "mcpServers": {
    "test-genie": {
      "command": "npx",
      "args": ["test-genie-mcp"],
      "env": {
        "TEST_GENIE_ALLOWED_ROOT": "/path/to/your/project"
      }
    }
  }
}

# 3. Restart Claude Desktop. From a chat:
#    "Run the iterate-fix loop on /Users/me/my-rn-app with autoApply=false"

Expected output (truncated):

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Iterative fix loop f8b3… — PAUSED-FOR-CONFIRMATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Iterations completed: 1
Fixes applied:        0
Regressions rolled back: 0
Final tests:          7/10 passing (3 failing)

Pending confirmations (3):
  - 71fbe…: Fix: useEffect missing cleanup for setInterval (confidence: 85)
  - 92ad1…: Fix: Force-unwrap on possibly-undefined name (confidence: 85)
  - …

Resume token: f8b3…

Re-call with autoApply: true (or resumeToken: "f8b3…") to actually patch the files.


Real use cases

The flows below describe the run_iterative_fix_loop path (v3.0 headline) — full detect → propose → dry-run → apply-with-backup → re-run-tests → rollback-on-regression. The diagnose_project autoFix path in v3.1.1 is the narrower mechanical-replacement-only path; see SAFETY.md §4 for what that one actually touches.

1. React Native memory-leak self-healing

A team adds setInterval(...) in a useEffect and forgets cleanup. test-genie's detect_memory_leaks flags it, suggest_fixes proposes return () => clearInterval(id) (src/tools/fixing/suggestFixes.ts:169-179), the loop dry-runs the patch through the TS compiler, applies with backup, re-runs only the affected snapshot test, confirms 100% pass, stops. Before: 1 failing snapshot. After: 0 failing, 1 fix applied, 1 backup at .test-genie-backups/.

2. Flutter widget dispose() automation

AnimationController left undisposed. test-genie sees the missing dispose() override, generates a Dart @override dispose() { controller.dispose(); super.dispose(); } block (suggestFixes.ts:214-217), runs dart analyze on the patched file, applies, re-runs flutter test, converges.

3. iOS retain-cycle (closure capture)

self.timer = Timer.scheduledTimer(...) { _ in self.tick() } — rule-based detector flags closure self-capture, fixer rewrites to [weak self] _ in guard let self = self else { return }; self.tick() (suggestFixes.ts:239-242). If swiftc is on PATH the syntax check is real; otherwise test-genie reports "downgraded validation" so you know.


How the iterate-fix loop works

┌────────────────────┐
│   collect tests    │  (run_scenario_test / supplied list)
└─────────┬──────────┘
          │
   pass-rate ≥ threshold? ── yes ──▶  SUCCESS
          │ no
          ▼
┌────────────────────┐
│  detect issues     │   memory + logic analyzers
└─────────┬──────────┘
          │
┌────────────────────┐
│  suggest fixes     │   rule-based (default) → LLM (hybrid, optional)
└─────────┬──────────┘
          │
┌────────────────────┐
│  dry-run + syntax  │   TS compiler API / platform compiler / brace check
└─────────┬──────────┘
          │
┌────────────────────┐
│  apply with backup │   per-file `.test-genie-backups/`
└─────────┬──────────┘
          │
┌────────────────────┐
│  re-run tests      │   regression?  yes → auto-rollback
└─────────┬──────────┘
          │
          ▼
   loop (≤ maxIterations, ≤ totalTimeout)

See docs/ITERATE_FIX_LOOP.md for a sequence diagram and the full safety-guard list.


Tools (23)

#ToolMode
1analyze_app_structurereal
2generate_scenariosreal
3create_test_planreal
4run_scenario_testhybrid
5run_simulationsimulated
6run_stress_testhybrid
7detect_memory_leaksreal
8detect_logic_errorsreal
9suggest_fixesreal
10confirm_fixreal
11apply_fixreal
12rollback_fixreal
13run_full_automationhybrid
14run_iterative_fix_loop (v3.0 headline)hybrid
15generate_reportreal
16get_pending_fixesreal
17get_test_historyreal
18analyze_performancereal
19analyze_code_deepreal
20generate_cicd_configreal
21diagnose_project (v3.1 headline — vibe-check)real
22detect_race_conditionsreal
23detect_security_issuesreal

mode legend in docs/SIMULATION_VS_REAL.md.

Plus 4 resources (test-genie://iteration-logs, …/test-history/{path}, …/iteration-logs/{loopId}, …/applied-fixes/{path}) and 3 prompts (full-test-pipeline, diagnose-failure, vibe-check).


What vibe-check catches

Race conditions (detect_race_conditions / diagnose_project):

PatternLanguageSeverityAuto-fixable (v3.1.1)
useState setter called after await without mount guardTS/JS/Reacthighno (structural)
useEffect with async fetch, no AbortController/cleanupTS/JS/Reacthighno (structural)
arr.forEach(async ...) (silent fire-and-forget)TS/JSmediumno (ordering-sensitive)
Adjacent fetches without Promise.all / sequencingTS/JSmediumno
TOCTOU: existsSync then readFileSync without lockTS/JS Nodemediumno
Non-atomic counter increment in async contextTS/JSlowno
@Published mutation outside @MainActorSwiftmediumno
Concurrent DispatchQueue writes without .barrierSwiftmediumno
MutableStateFlow mutated off Dispatchers.MainKotlinmediumno
Flow collected without flowOnKotlinlowno
Goroutine + shared map without sync.MutexGohighno

v3.1.1 honesty audit: useEffect-no-abort and forEach-await were previously advertised as auto-fixable. They are not — wrapping with AbortController or rewriting to Promise.all(arr.map(...)) changes behavior we can't verify statically. They are now report-only. See SAFETY.md.

Security (detect_security_issues / diagnose_project):

PatternSeverityCWEAuto-fixable (v3.1.1)
Hardcoded AWS / Stripe / GitHub / Google / Slack tokencritical / highCWE-798no (rotate)
Hardcoded JWT secret literalhighCWE-798no
API token in URL query stringhighCWE-200no
.env file present but not gitignoredhighCWE-538no (rotation must follow)
SQL string concat with req.params / req.bodycriticalCWE-89no
innerHTML / dangerouslySetInnerHTML with dynamic valuehighCWE-79no
eval() / new Function() with non-literalcriticalCWE-95no
Math.random() in security-sensitive file, standalone assignmenthighCWE-338yes (crypto.randomInt)
Math.random() mixed into arithmetichighCWE-338no (semantic)
createHash('md5'|'sha1') in security-keyword filehighCWE-327yes ('sha256')
createHash('md5'|'sha1') elsewheremediumCWE-327no (below severity floor)
child_process.exec with user-input template literalcriticalCWE-78no
fetch(req.query.url) (SSRF)highCWE-918no
CORS * origin + Allow-Credentials: truehighCWE-942no
Cookie set without httpOnly / secure / sameSitelowCWE-1004no
yaml.load without safe schemamediumCWE-502no

v3.1.1 honesty audit: .env/Math.random (general)/yaml.load were previously advertised as auto-fixable. They were either too risky to rewrite blindly or no strategy shipped — flipped to report-only. See SAFETY.md §5.


What vibe-check misses (honest list)

This is a "catch the obvious stuff in 30s" filter, not Snyk / Semgrep / a full SAST tool. We don't catch:

  • Cross-file data-flow. If user input flows through three files before reaching a db.query, the regex won't connect the dots. A real SAST traces taint across the call graph. Roadmap: ts-morph reference walking for top-N entry points.
  • Vulnerable transitive deps. We don't query npm advisories — that's npm audit's job, and bundling a stale advisory list would lie. Run npm audit --json in parallel if you want dep-CVE coverage.
  • Race conditions across processes. We catch in-process JS / Swift / Kotlin / Go races. Distributed races (lock ordering across services, DB transactions) need different tooling.
  • Type-correct but logic-broken code. The analyzer is syntactic, not semantic. A Math.random() named getNonce won't fool us; a properly-named crypto.randomBytes used with a tiny entropy budget will.
  • Custom secret formats. Internal company tokens with unique prefixes need a regex you can add to securityAnalyzer.SECRET_PATTERNS. PR welcome.
  • Real-time / dynamic issues. Memory leaks under load, network timeouts, slow renders mid-interaction — those need run_stress_test / run_simulation, not static analysis.

If you want deeper coverage on top of vibe-check: feed the findings into run_iterative_fix_loop for test-verified application, or escalate to Snyk / Semgrep / GitHub Advanced Security for compliance use cases.


vibe-check vs alternatives

vibe-check (test-genie)SnykSemgrepGitHub Advanced Security
Runs locallyyeshybrid (cloud)yesno (cloud)
Telemetry-freeyes (zero network calls)nopartialno
Fix loop integrationyes (run_iterative_fix_loop)nonono
Race-condition detectionyes (JS/Swift/Kotlin/Go)nopartialpartial
Cross-file taint flowno (roadmap)yesyesyes
Setup timenone (already installed if test-genie is installed)account + authinstall + rulesetrepo-level enable

If your goal is "before I commit, what's broken?", vibe-check wins on latency. If your goal is "compliance + supply chain audit", use the dedicated tools.


When NOT to use test-genie

  • Production-gate test runs. test-genie is built for the development feedback loop. For shipping decisions, use a proper CI that you control end-to-end.
  • Code your team must hand-review every line of. The loop's job is to propose and apply fixes; if every fix needs a human eye, leave autoApply: false (the default) and use it as a fix-proposal generator only.
  • No backup / no version control situations. test-genie's auto-rollback is best-effort and requires the per-file backup to exist. Always run inside a git working tree.

Comparison

test-genieDetoxMaestroxcodebuild test
Runs E2E / unit tests✅ (via Jest/Detox/etc.)✅✅✅
Detects code issues✅ rule + LLM❌❌❌
Iterative fix loop✅ (run_iterative_fix_loop)❌❌❌
Auto-rollback on test regression✅ inside run_iterative_fix_loop only❌❌❌
Auto-rollback on syntax failure✅ all apply paths❌❌❌
MCP-native (talks to Claude / agents)✅❌❌❌
Multi-platformiOS+Android+Web+Flutter+RNiOS+AndroidiOS+AndroidiOS only

Scope note: diagnose_project autoFix: true rolls back on syntax-validate failure (applyFix.ts:185-202) but does not re-run tests, so it cannot detect test regressions. For test-driven rollback use run_iterative_fix_loop. See SAFETY.md §2.4.

test-genie uses tools like Jest, Detox, and xcodebuild test under the hood — it sits at the orchestration layer, not the test-runner layer.


Known limitations

  • Platform syntax check downgrade. For Swift/Kotlin/Java/Dart we try the platform compiler in -typecheck mode. If the compiler isn't on PATH, we fall back to brace-balance validation and surface downgraded: true in the result. Install swiftc / kotlinc / javac / dart for real validation.
  • LLM is optional and gated. strategy: 'hybrid' only kicks LLM in when rule-based confidence is below threshold. Without an API key the loop is rule-based-only — no failure.
  • Storage is per-machine. Test history / iteration logs live under $TEST_GENIE_STORAGE_DIR (defaults to ~/.test-genie-mcp). Not synced across machines.
  • Simulated mode is "simulation," not magic. run_simulation returns plausible anomalies, not real ones. Use run_scenario_test (hybrid) for real-device runs.

Configuration

Env varDefaultPurpose
TEST_GENIE_ALLOWED_ROOTcwdCapability-based path safety — server refuses to read/write outside this root.
TEST_GENIE_STORAGE_DIR~/.test-genie-mcpWhere scenarios / results / iteration logs live.
TEST_GENIE_LLM_PROVIDERauto-detectanthropic / openai / none.
ANTHROPIC_API_KEY—Used when provider = anthropic.
OPENAI_API_KEY—Used when provider = openai.
TEST_GENIE_ANTHROPIC_MODELclaude-haiku-4-5Override Anthropic model.
TEST_GENIE_OPENAI_MODELgpt-4o-miniOverride OpenAI model.

Migrating from v2.x

  • run_full_automation still works. The confirmMode / autoFix options are kept for compatibility but autoApply: boolean is the new way — autoApply: true is equivalent to confirmMode: 'auto'.
  • Subprocess hardening means platform tools now reject scheme / device / package-name arguments that contain shell metacharacters. If your CI was passing weird-looking values, sanitize them first.
  • See CHANGELOG.md for the full breaking-change list + migration recipes.

Roadmap

  • LLM-based fix-proposal voting (multiple proposals → pick the best by syntax + retest delta)
  • Multi-repo sync (run the loop across N repos in parallel from one MCP call)
  • A "watch mode" that runs the loop on file save
  • Better Detox / Maestro artifact ingestion (link videos into iteration logs)

Contributing

Issues, PRs, and ideas welcome — see CONTRIBUTING.md (TODO). Code lives under src/, tests under tests/. Run npm test before sending a PR.

Maintainer

@MUSE-CODE-SPACE — Yoonkyoung Gong.

License

MIT — see LICENSE.

Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Keep your Mac awake
Keep your Mac awake
Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.
One time payment $9 →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Categories
Automation & WorkflowsMobile Development
Registryactive
Packagetest-genie-mcp
TransportSTDIO
UpdatedDec 27, 2025
View on GitHub

Related Automation & Workflows MCP Servers

View all →
n8n Workflow Builder

makafeli/n8n-workflow-builder

AI assistant integration for n8n workflow automation through Model Context Protocol (MCP). Connect Claude Desktop, ChatGPT, and other AI assistants to n8n for natural language workflow management.
519
N8N

illuminaresolutions/n8n-mcp-server

MCP server implementation for n8n workflow automation
120
Make Mcp

danishashko/make-mcp

Unofficial MCP server for Make.com automation - build, validate & deploy scenarios via AI
5
n8n Manager MCP

lukisch/n8n-manager-mcp

MCP server for n8n workflow management -- view, create, sync and manage workflows via AI.
1
Airflow

io.github.us-all/airflow

Airflow MCP — list DAGs/runs/task instances, tail logs, trigger and clear (write-gated)
Mcp Workflow

io.github.infoinlet-marketplace/mcp-workflow

Workflow automation for AI agents — browse 125 connectors + 234 templates, run via FluxTurn.