architecture
api compatibility
TOVA exposes an OpenAI-compatible HTTP surface. Existing applications integrate by changing the base URL and the API key, with optional route hints to control cost and latency.
supported endpoints
- chat completions
- completions
- embeddings
- model discovery
- streaming responses
drop-in with the OpenAI SDK
Change the base URL and the API key. Everything else works the same.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.tova.fyi/v1",
apiKey: process.env.TOVA_API_KEY,
});
const res = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "hello" }],
stream: true,
});
for await (const chunk of res) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}model aliasing
Requests can either pin a specific model or use an auto model class. Auto classes let the router pick the best concrete model for the request given the active policy.
model: "auto" // default policy — balanced cost / latency / quality
model: "auto:cheap" // prioritize lowest cost
model: "auto:fast" // prioritize lowest latency
model: "gpt-4o" // pinned model — router still selects the cheapest healthy provider
model: "claude-3.5-sonnet"alias behavior
auto— TOVA selects the best model based on the default routing policyauto:cheap— TOVA prioritizes the lowest credit cost for the requestauto:fast— TOVA prioritizes the lowest latency to first token- pinned models still allow provider routing when multiple providers support the same model or an equivalent endpoint
response metadata
Every response includes routing and accounting metadata so applications can log spend, monitor provider selection, and surface cost back to end users.
{
"model": "auto",
"provider": "groq",
"usage": {
"input_tokens": 842,
"output_tokens": 219,
"credits_spent": 0.0184
},
"route": {
"objective": "cheapest",
"fallback_used": false
}
}raw HTTP
curl https://api.tova.fyi/v1/chat/completions \
-H "Authorization: Bearer $TOVA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3.5-sonnet",
"messages": [{"role":"user","content":"explain MoE routing"}],
"route": { "objective": "cheapest", "max_latency_ms": 1200 }
}'info
The
route object is a TOVA-specific extension. Omit it to use the default policy, or set it per request to shift the cost / latency trade-off.