This gets you working with Ollama's local AI runtime through its REST API and OpenAI-compatible endpoints. You'll want it when you're building chat completions, embeddings, or streaming responses against locally running models like Llama or Gemma. The skill covers both the native API at localhost:11434 and the v1 endpoints that work with OpenAI's Python and JavaScript libraries. It includes real examples for vision models, structured JSON outputs with Pydantic, and cloud model authentication. The reference docs are thorough on environment variables, GPU loading checks with ollama ps, and platform-specific server configuration. Honest take: if you're running models locally instead of hitting external APIs, this covers the practical integration patterns you'll actually use.
npx -y skills add rawveg/skillsforge-marketplace --skill ollama --agent claude-codeInstalls into .claude/skills of the current project.
Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.
This skill should be triggered when:
Generate a simple chat response:
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Why is the sky blue?"
}
]
}'
Generate a text response from a prompt:
curl http://localhost:11434/api/generate -d '{
"model": "gemma3",
"prompt": "Why is the sky blue?"
}'
Use Ollama with the OpenAI Python library:
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama3.2',
)
Ask questions about images:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")
response = client.chat.completions.create(
model="llava",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "data:image/png;base64,iVBORw0KG...",
},
],
}
],
max_tokens=300,
)
Create vector embeddings for text:
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
embeddings = client.embeddings.create(
model="all-minilm",
input=["why is the sky blue?", "why is the grass green?"],
)
Get structured JSON responses:
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
class FriendInfo(BaseModel):
name: str
age: int
is_available: bool
class FriendList(BaseModel):
friends: list[FriendInfo]
completion = client.beta.chat.completions.parse(
temperature=0,
model="llama3.1:8b",
messages=[
{"role": "user", "content": "Return a list of friends in JSON format"}
],
response_format=FriendList,
)
friends_response = completion.choices[0].message
if friends_response.parsed:
print(friends_response.parsed)
Use Ollama with the OpenAI JavaScript library:
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:11434/v1/",
apiKey: "ollama", // required but ignored
});
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: "user", content: "Say this is a test" }],
model: "llama3.2",
});
Sign in to use cloud models:
# Sign in from CLI
ollama signin
# Then use cloud models
ollama run gpt-oss:120b-cloud
Or use API keys for direct cloud access:
export OLLAMA_API_KEY=your_api_key
curl https://ollama.com/api/generate \
-H "Authorization: Bearer $OLLAMA_API_KEY" \
-d '{
"model": "gpt-oss:120b",
"prompt": "Why is the sky blue?",
"stream": false
}'
Set environment variables for server configuration:
macOS:
# Set environment variable
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
# Restart Ollama application
Linux (systemd):
# Edit service
systemctl edit ollama.service
# Add under [Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
# Reload and restart
systemctl daemon-reload
systemctl restart ollama
Windows:
1. Quit Ollama from task bar
2. Search "environment variables" in Settings
3. Edit or create OLLAMA_HOST variable
4. Set value: 0.0.0.0:11434
5. Restart Ollama from Start menu
Verify if your model is using GPU:
ollama ps
Output shows:
100% GPU - Fully loaded on GPU100% CPU - Fully loaded in system memory48%/52% CPU/GPU - Split between bothhttp://localhost:11434/apihttps://ollama.com/api/v1/ endpoints for OpenAI librarieshttp://localhost:11434ollama signin) or API keyhttps://ollama.com/apigemma3, llama3.2, qwen3)-cloud (e.g., gpt-oss:120b-cloud, qwen3-coder:480b-cloud)llava)OLLAMA_HOST - Change bind address (default: 127.0.0.1:11434)OLLAMA_CONTEXT_LENGTH - Context window size (default: 2048 tokens)OLLAMA_MODELS - Model storage directoryOLLAMA_ORIGINS - Allow additional web origins for CORSHTTPS_PROXY - Proxy server for model downloadsStatus Codes:
200 - Success400 - Bad Request (invalid parameters)404 - Not Found (model doesn't exist)429 - Too Many Requests (rate limit)500 - Internal Server Error502 - Bad Gateway (cloud model unreachable)Error Format:
{
"error": "the model failed to generate a response"
}
"stream": false to get complete response in one objectThis skill includes comprehensive documentation in references/:
llms-txt.md - Complete API reference covering:
/api/generate, /api/chat, /api/embed, etc.)llms.md - Documentation index listing all available topics:
Use the reference files when you need:
Start with these common patterns:
/api/generate endpoint with a prompt/api/chat with messages arraybase_url='http://localhost:11434/v1/'ollama ps to verify model loadingRead llms-txt.md section on "Introduction" and "Quickstart" for foundational concepts.
Focus on:
Check the specific API endpoints in llms-txt.md for detailed parameter options.
Explore:
Refer to platform-specific sections in llms.md and configuration details in llms-txt.md.
Building a chatbot:
/api/chat endpointCreating embeddings for search:
/api/embed endpointRunning behind a firewall:
HTTPS_PROXY environment variableUsing cloud models:
ollama signin once-cloud suffixCheck:
ollama ps
Solutions:
Problem: Ollama only accessible from localhost
Solution:
# Set OLLAMA_HOST to bind to all interfaces
export OLLAMA_HOST="0.0.0.0:11434"
See "How do I configure Ollama server?" in llms-txt.md for platform-specific instructions.
Problem: Cannot download models behind proxy
Solution:
# Set proxy (HTTPS only, not HTTP)
export HTTPS_PROXY=https://proxy.example.com
# Restart Ollama
See "How do I use Ollama behind a proxy?" in llms-txt.md.
Problem: Browser extension or web app cannot access Ollama
Solution:
# Allow specific origins
export OLLAMA_ORIGINS="chrome-extension://*,moz-extension://*"
See "How can I allow additional web origins?" in llms-txt.md.
# CLI Commands
ollama signin # Sign in to ollama.com
ollama run gemma3 # Run a model interactively
ollama pull gemma3 # Download a model
ollama ps # List running models
ollama list # List installed models
# Check API Status
curl http://localhost:11434/api/version
# Environment Variables (Common)
export OLLAMA_HOST="0.0.0.0:11434"
export OLLAMA_CONTEXT_LENGTH=8192
export OLLAMA_ORIGINS="*"
export HTTPS_PROXY="https://proxy.example.com"
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills