tova
architecture

api compatibility

TOVA exposes an OpenAI-compatible HTTP surface. Existing applications integrate by changing the base URL and the API key, with optional route hints to control cost and latency.

supported endpoints

  • chat completions
  • completions
  • embeddings
  • model discovery
  • streaming responses

drop-in with the OpenAI SDK

Change the base URL and the API key. Everything else works the same.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.tova.fyi/v1",
  apiKey: process.env.TOVA_API_KEY,
});

const res = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "hello" }],
  stream: true,
});

for await (const chunk of res) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

model aliasing

Requests can either pin a specific model or use an auto model class. Auto classes let the router pick the best concrete model for the request given the active policy.

model: "auto"            // default policy — balanced cost / latency / quality
model: "auto:cheap"      // prioritize lowest cost
model: "auto:fast"       // prioritize lowest latency
model: "gpt-4o"          // pinned model — router still selects the cheapest healthy provider
model: "claude-3.5-sonnet"

alias behavior

  • auto — TOVA selects the best model based on the default routing policy
  • auto:cheap — TOVA prioritizes the lowest credit cost for the request
  • auto:fast — TOVA prioritizes the lowest latency to first token
  • pinned models still allow provider routing when multiple providers support the same model or an equivalent endpoint

response metadata

Every response includes routing and accounting metadata so applications can log spend, monitor provider selection, and surface cost back to end users.

{
  "model": "auto",
  "provider": "groq",
  "usage": {
    "input_tokens": 842,
    "output_tokens": 219,
    "credits_spent": 0.0184
  },
  "route": {
    "objective": "cheapest",
    "fallback_used": false
  }
}

raw HTTP

curl https://api.tova.fyi/v1/chat/completions \
  -H "Authorization: Bearer $TOVA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3.5-sonnet",
    "messages": [{"role":"user","content":"explain MoE routing"}],
    "route": { "objective": "cheapest", "max_latency_ms": 1200 }
  }'
info
The route object is a TOVA-specific extension. Omit it to use the default policy, or set it per request to shift the cost / latency trade-off.
$/1M tokopenai$7.50anthropic$9.00google$6.25meta$3.50big4 avg$6.56