AI Models

Connect Any AI Model — From Cloud to Self-Hosted

Anthropic Claude, OpenAI, Google Gemini, AWS Bedrock, Ollama, and 200+ more. Assign models per agent, configure automatic fallback, and track cost per run.

Model Providers

Every Model, One Platform

Direct providers, cloud platforms, model routers, and self-hosted options. Connect the right model for every agent and every task.

Direct Providers

Anthropic ClaudeClaude 4, Sonnet, Haiku
OpenAIGPT-4o, o1, GPT-4 Turbo
Google GeminiGemini 2.5, Flash
Meta LlamaLlama 3.3, Llama 4

Cloud AI Platforms

AWS BedrockClaude, Titan, Llama
Azure OpenAIGPT-4o, Embeddings
Google Vertex AIGemini, PaLM

Model Routers

OpenRouter200+ models, one API
Together AIOpen-source models
GroqUltra-fast inference

Self-Hosted

OllamaLocal models, zero cloud
vLLMHigh-throughput serving
OpenAI-CompatibleAny compatible endpoint
Custom Fine-TunedYour trained models
Per-Agent Models

Right Model, Right Agent

Not every task needs the most expensive model. Assign models based on complexity, cost, and quality requirements — per agent, not per platform.

Task-based model selection
Complex reasoning gets Claude Opus. Routine triage gets Haiku. You control the tradeoff.
Switch models without redeploying
Change an agent's model in the dashboard. No code changes, no downtime.
Compare model performance
Run the same agent on multiple models and compare quality, speed, and cost side-by-side.
Agent Model Configuration
Cora — Booking
Claude Sonnet
Nuanced conversation
Rex — Support
GPT-4o
Broad knowledge
Sage — Analytics
Claude Haiku
Cost efficient
Lila — Discovery
Gemini Flash
Fast responses
Reliability

Automatic Fallback — Zero Downtime

Configure fallback chains so your agents never go offline. If the primary model fails, the next one picks up instantly.

Fallback Chain — Cora (Booking Agent)

Claude Sonnet 4primary
if unavailable
GPT-4ofallback 1
if unavailable
Claude Haikufallback 2
Automatic failover in under 200ms
Cost Intelligence

Know What Every Model Costs

Track token usage and cost per model, per agent, and per run. Optimize spend with data, not guesswork.

Model Cost Breakdown
March 2026
Total Spend
$24.90
Total Tokens
15.4M
Avg $/1K tokens
$0.0016
Cora (Booking)Claude Sonnet
1.2M tokens$4.80
Rex (Support)GPT-4o
3.4M tokens$17.00
Sage (Analytics)Claude Haiku
8.1M tokens$2.02
Echo (Retention)Gemini Flash
2.7M tokens$1.08
Capabilities

Complete Model Management

Everything you need to connect, configure, and optimize AI models across your entire agent fleet.

Any Model, Any Provider

Connect to Anthropic Claude, OpenAI GPT-4o, Google Gemini, Meta Llama, and hundreds more. Switch models without changing agent logic.

Per-Agent Assignment

Assign different models to different agents based on task complexity, cost sensitivity, and quality requirements. One platform, many models.

Automatic Fallback

If the primary model is unavailable or slow, OrchStack automatically routes to a fallback model. Zero downtime, seamless transition.

Cost Tracking Per Model

Track token usage and cost per model, per agent, and per run. Optimize spend by routing routine tasks to cheaper models.

Latency-Aware Routing

Route requests to the fastest available model endpoint. Automatic load balancing across multiple API keys and regions.

Secure Key Management

Centralized API key vault with per-workspace isolation. Keys are never exposed to agent code or logged in traces.

AI Model Connectivity FAQ

Connect Your Preferred AI Models

Any model, any provider. One platform to manage them all.

Free tier available · Bring your own API keys