Question 1

Can I use different models for different agents?

Accepted Answer

Yes. OrchStack supports per-agent model assignment. Your booking agent can run on Claude Sonnet for nuanced conversation, your analytics agent on Claude Haiku for cost efficiency, and your research agent on GPT-4o for broad knowledge. Each agent's model is configured independently and can be changed without redeploying the agent.

Question 2

How does automatic fallback work?

Accepted Answer

When you configure an agent, you can specify a primary model and one or more fallback models. If the primary model returns an error (rate limit, timeout, or service outage), OrchStack automatically retries the request with the next fallback model. The transition is seamless — the user never sees an error. Fallback events are logged in the audit trail so you can monitor reliability.

Question 3

Can I use self-hosted models?

Accepted Answer

Absolutely. OrchStack supports any OpenAI-compatible API endpoint, which includes Ollama, vLLM, LocalAI, and custom model servers. For air-gapped deployments, you can run both OrchStack and your LLMs on your own infrastructure with zero external network dependencies. Self-hosted models appear in the model selector alongside cloud providers.

Question 4

How is cost tracked per model?

Accepted Answer

OrchStack tracks token usage (input and output) for every model call, maps it to each model's pricing, and attributes costs to the specific agent, tenant, and run. The cost dashboard shows spend breakdowns by model, agent, workspace, and time period. You can set per-model budget limits and receive alerts when thresholds are reached.

Question 5

Does OrchStack support fine-tuned models?

Accepted Answer

Yes. Any model accessible via an API endpoint — including your own fine-tuned models — can be connected to OrchStack. Configure the endpoint URL, API key, and model identifier, and it appears as a selectable model for any agent. This works with fine-tuned models on OpenAI, custom models on Hugging Face Inference Endpoints, or models served via vLLM or Ollama.

Connect Any AI Model — From Cloud to Self-Hosted

Every Model, One Platform

Direct Providers

Cloud AI Platforms

Model Routers

Self-Hosted

Right Model, Right Agent

Automatic Fallback — Zero Downtime

Fallback Chain — Cora (Booking Agent)

Know What Every Model Costs

Complete Model Management

Any Model, Any Provider

Per-Agent Assignment

Automatic Fallback

Cost Tracking Per Model

Latency-Aware Routing

Secure Key Management

AI Model Connectivity FAQ

Connect Your Preferred AI Models