architecture

api compatibility

TOVA exposes an OpenAI-compatible HTTP surface. Existing applications integrate by changing the base URL and the API key, with optional route hints to control cost and latency.

supported endpoints

chat completions
completions
embeddings
model discovery
streaming responses

drop-in with the OpenAI SDK

Change the base URL and the API key. Everything else works the same.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.tova.fyi/v1",
  apiKey: process.env.TOVA_API_KEY,
});

const res = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "hello" }],
  stream: true,
});

for await (const chunk of res) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

model aliasing

Requests can either pin a specific model or use an auto model class. Auto classes let the router pick the best concrete model for the request given the active policy.

model: "auto"            // default policy — balanced cost / latency / quality
model: "auto:cheap"      // prioritize lowest cost
model: "auto:fast"       // prioritize lowest latency
model: "gpt-4o"          // pinned model — router still selects the cheapest healthy provider
model: "claude-3.5-sonnet"

alias behavior

auto — TOVA selects the best model based on the default routing policy
auto:cheap — TOVA prioritizes the lowest credit cost for the request
auto:fast — TOVA prioritizes the lowest latency to first token
pinned models still allow provider routing when multiple providers support the same model or an equivalent endpoint

response metadata

Every response includes routing and accounting metadata so applications can log spend, monitor provider selection, and surface cost back to end users.

{
  "model": "auto",
  "provider": "groq",
  "usage": {
    "input_tokens": 842,
    "output_tokens": 219,
    "credits_spent": 0.0184
  },
  "route": {
    "objective": "cheapest",
    "fallback_used": false
  }
}

raw HTTP

curl https://api.tova.fyi/v1/chat/completions \
  -H "Authorization: Bearer $TOVA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3.5-sonnet",
    "messages": [{"role":"user","content":"explain MoE routing"}],
    "route": { "objective": "cheapest", "max_latency_ms": 1200 }
  }'

info

The route object is a TOVA-specific extension. Omit it to use the default policy, or set it per request to shift the cost / latency trade-off.

← previous

routing layer

unified credits