Wraps the Alembica structured extraction library as an MCP server, giving you tools to validate extraction schemas, estimate token costs across different LLM providers, run semantic extraction jobs on unstructured text, and query available schemas. Built for research workflows where you need to transform documents, articles, or corpora into structured datasets using OpenAI, Anthropic, Google, Cohere, DeepSeek, or self-hosted models. The cost estimation tool is handy for budgeting before running large extraction jobs, though it only works with public API providers. Runs via stdio transport from the Go binary or GHCR container. Reach for this when you're building data pipelines that need repeatable LLM extraction with cost visibility.
alembicaOpen Science Software for Semantic Synthesis and Extraction of Information from Unstructured Sources.
alembica simplifies the use of Large Language Models (LLMs) to extract structured datasets from unstructured corpora of text.
It provides a flexible and scalable framework to process, synthesize, and transform textual information into structured formats suitable for analysis and further processing.
Supports OpenAI, Google AI, Anthropic, Cohere, DeepSeek, Perplexity, AWS Bedrock, Azure AI, Vertex AI, and Self-Hosted OpenAI-compatible providers.
To install alembica in Go, run:
go get github.com/open-and-sustainable/alembica
If you want to use alembica in other programming languages, check out the C-Shared Library in the User Guide.
User Guide – Learn how to use alembica in different programming languages.
API Reference – Explore the Go package documentation.
alembica includes an optional MCP server for agent tool access.
The MCP server can be used from a locally built Go binary, the GHCR container image, or the MCP Registry.
Available in the official MCP Registry:
io.github.open-and-sustainable/alembica-mcpSee the User Guide MCP page for installation options, run commands, and tool schemas.
The MCP server is published to the MCP Registry by GitHub Actions when a version tag such as v1.2.3 is pushed.
The workflow uses GitHub OIDC for authentication, and each new registry version is published explicitly by CI rather than implicitly from GitHub Releases.
Note: Cost estimation is not supported for Self-Hosted, AWS Bedrock, Azure AI, or Vertex AI providers and will return zero.
Optional model fields for cloud/local providers:
base_url and api_version for Azure/OpenAI-compatible endpointsregion for AWS Bedrockproject_id and location for Vertex AIUse schemaVersion: "v2" when you need these cloud/local provider fields or non-enumerated model IDs.
Author: Riccardo Boero - ribo@nilu.no
Contributions are welcome!
alembica is licensed under the GNU AFFERO GENERAL PUBLIC LICENSE, Version 3.

Boero, R. (2025).
alembica- Open Science Software for Semantic Synthesis and Extraction of Information from Unstructured Sources. Zenodo. https://doi.org/10.5281/zenodo.14899666