tova
architecture

routing layer

The TOVA routing engine selects an inference provider for every request by scoring each eligible source against price, performance, and reliability signals.

what the router evaluates

  • price per input token and per output token
  • provider uptime over rolling windows
  • latency to first token and tokens per second
  • throughput and concurrent capacity
  • available model inventory and equivalence classes
  • recent request success rate
default routing factor weightsdefault policy
Default policy weights. Per-request overrides re-shape the radar at call time.

scoring model

Each eligible provider receives a composite score. The router selects the lowest-scoring provider that satisfies the request's hard constraints (model, max latency, max cost, region).

provider_score =
    price_weight    * normalized_price
  + latency_weight  * normalized_latency
  + error_weight    * recent_error_rate
  + capacity_weight * capacity_penalty
  + uptime_weight   * uptime_penalty

Weights are policy-driven. If the request prioritizes cost, the price weight increases. If the request prioritizes speed, latency and throughput weights increase. Hard constraints filter the provider set before scoring; weights only determine the winner among providers that already qualify.

route policy examples

{
  "route": {
    "objective": "cheapest",
    "max_latency_ms": 1200,
    "fallback": true
  }
}
{
  "route": {
    "objective": "fastest",
    "max_cost_per_million_tokens": 2.50
  }
}

fallback and failover

If the selected provider fails, times out, or returns a rate-limit error before streaming begins, TOVA can retry the request against the next eligible provider. Developers configure fallback behavior per request.

fallback parameters

  • fallback — enable or disable automatic retry
  • max_retries — cap on the number of alternate providers attempted
  • timeout_ms — request timeout before failover is triggered
  • next-best-provider strategy uses the same scoring model with the failing provider removed
  • once output streaming has already started, the request cannot be safely failed over and surfaces the partial stream
{
  "route": {
    "objective": "balanced",
    "fallback": true,
    "max_retries": 2,
    "timeout_ms": 15000
  }
}

evolution of the layer

The routing layer is designed to extend over time toward:

  • permissioned third-party inference suppliers
  • regional routing and SLA tiers
  • capacity marketplace dynamics for surplus throughput
  • dynamic pricing optimization across providers
  • agent-driven inference budgets and task-level routing
info
The router is policy-driven. Operators can publish custom routing policies — for example EU-only providers, no-train clauses, or strict latency bounds — and TOVA enforces them across every call.
$/1M tokopenai$7.50anthropic$9.00google$6.25meta$3.50big4 avg$6.56