This is a PostgreSQL and MongoDB backed toolkit for analyzing ESG reports using semantic search and LLM extraction. It exposes 31 tools across six servers: a metrics extractor that pulls ESRS aligned KPIs for emissions, energy, water, and social domains, a PDF processor with table extraction and embedding generation, a pgvector store for similarity search and query caching, a regulation server that downloads and ingests EU directives from EUR-Lex, and a web scraper for discovering company reports. You'll need an Anthropic API key for RAG queries and metric extraction. Reach for this if you're building compliance workflows or ESG data pipelines and want structured outputs from unstructured sustainability documents.
Open-source Model Context Protocol servers for ESG (Environmental, Social, and Governance) data extraction, analysis, and regulation management.
31 tools across 6 servers — install once, run only what you need.
Author: Ioannis Michos (johnmichos.tf@gmail.com)
pip install esg-mcp-servers
| Service | Required for |
|---|---|
| PostgreSQL 16 + pgvector | Vector storage, metrics, regulations |
| MongoDB 7 | PDF binary storage (GridFS) |
| Anthropic API key | RAG queries, metric extraction |
Spin up local databases with Docker:
docker compose up -d
Run the database migration:
esg-mcp-migrate
Copy .env.example and fill in your values:
cp .env.example .env
Key variables:
| Variable | Default | Description |
|---|---|---|
POSTGRES_DSN | postgresql://esg:esg@localhost/esg_platform | PostgreSQL connection string |
MONGODB_URI | mongodb://localhost:27017 | MongoDB connection string |
ANTHROPIC_API_KEY | — | Required for RAG and LLM extraction |
EMBEDDING_MODEL | Snowflake/snowflake-arctic-embed-l-v2.0 | Sentence-transformer model |
EMBEDDING_DIMENSIONS | 1024 | Embedding vector size |
ESRS-aligned KPI extraction for emissions, energy, water, waste, social, and governance domains.
| Tool | Description |
|---|---|
extract_emissions_data | GHG Scope 1, 2, 3 extraction |
extract_energy_data | Energy consumption & renewables (ESRS E2) |
extract_water_data | Water withdrawal, discharge, consumption (ESRS E3) |
extract_waste_data | Waste generation, recycling, landfill (ESRS E5) |
extract_social_data | Workforce, diversity, H&S (ESRS S1) |
extract_governance_data | Board composition & governance (ESRS G1) |
answer_esg_query | Free-text RAG Q&A over documents |
keyword_similarity_search | Semantic keyword search in documents |
detect_query_domain | Classify query into ESG domain |
detect_emissions_query_type | Detect emissions scope/year from query |
batch_extract_metrics | Extract all KPIs and persist to DB |
PDF validation, text/table extraction, and embedding generation.
| Tool | Description |
|---|---|
verify_esg_report | RandomForest ESG report classifier |
extract_text_chunks | Extract and chunk PDF text |
extract_tables | Table extraction with OCR fallback |
generate_embeddings | Batch embedding generation |
process_pdf_full_pipeline | End-to-end: extract, embed, store |
pgvector CRUD operations for document chunks and query cache.
| Tool | Description |
|---|---|
upsert_document_chunks | Insert/update chunks with embeddings |
similarity_search | Cosine similarity search |
get_cached_query_response | Retrieve cached LLM responses |
cache_query_response | Store LLM response in cache |
list_documents | List indexed documents |
EU ESG regulation download, ingestion, and semantic search.
| Tool | Description |
|---|---|
download_regulation | Download regulation PDF from EUR-Lex |
download_all_regulations | Batch download all configured regulations |
ingest_regulation | Extract, parse articles, embed, store |
search_regulation_text | Semantic search across regulation articles |
list_regulations | List ingested regulations |
ESG report discovery and download from the web.
| Tool | Description |
|---|---|
search_esg_reports | Multi-engine search for ESG PDFs |
crawl_company_website | Deep-crawl website for PDF links |
download_pdf | Download PDF and store in GridFS |
get_scraping_status | Check scrape job status |
cancel_scraping_job | Cancel running scrape job |
All tools from all servers in a single process.
Add to your claude_desktop_config.json:
{
"mcpServers": {
"esg-metrics": {
"command": "esg-metrics-extractor"
},
"esg-pdf": {
"command": "esg-pdf-processor"
},
"esg-vectors": {
"command": "esg-vector-store"
},
"esg-regulations": {
"command": "esg-regulations"
},
"esg-scraper": {
"command": "esg-scraper"
}
}
}
Or use the combined server for all tools:
{
"mcpServers": {
"esg": {
"command": "esg-mcp-all"
}
}
}
The scraper server requires Selenium for JavaScript-heavy sites:
pip install esg-mcp-servers[scraper]
git clone https://github.com/freminder/esg-mcp-servers.git
cd esg-mcp-servers
pip install -e ".[scraper,dev]"
docker compose up -d
esg-mcp-migrate
MIT — see LICENSE.
ANTHROPIC_API_KEY*secretAnthropic API key — required for RAG queries and LLM-based metric extraction
POSTGRES_DSN*PostgreSQL connection string with pgvector extension (e.g. postgresql://esg:esg@localhost/esg_platform)
MONGODB_URI*MongoDB connection string for PDF binary storage via GridFS (e.g. mongodb://localhost:27017)
EMBEDDING_MODELSentence-transformer model name for embedding generation (default: Snowflake/snowflake-arctic-embed-l-v2.0)
EMBEDDING_DIMENSIONSEmbedding vector dimension size (default: 1024)
csoai-org/pdf-document-mcp
xt765/mcp-document-converter
io.github.xjtlumedia/markdown-formatter
io.github.ai-aviate/better-notion
suekou/mcp-notion-server
meterlong/mcp-doc