routing layer
The TOVA routing engine selects an inference provider for every request by scoring each eligible source against price, performance, and reliability signals.
what the router evaluates
- price per input token and per output token
- provider uptime over rolling windows
- latency to first token and tokens per second
- throughput and concurrent capacity
- available model inventory and equivalence classes
- recent request success rate
scoring model
Each eligible provider receives a composite score. The router selects the lowest-scoring provider that satisfies the request's hard constraints (model, max latency, max cost, region).
provider_score =
price_weight * normalized_price
+ latency_weight * normalized_latency
+ error_weight * recent_error_rate
+ capacity_weight * capacity_penalty
+ uptime_weight * uptime_penaltyWeights are policy-driven. If the request prioritizes cost, the price weight increases. If the request prioritizes speed, latency and throughput weights increase. Hard constraints filter the provider set before scoring; weights only determine the winner among providers that already qualify.
route policy examples
{
"route": {
"objective": "cheapest",
"max_latency_ms": 1200,
"fallback": true
}
}{
"route": {
"objective": "fastest",
"max_cost_per_million_tokens": 2.50
}
}fallback and failover
If the selected provider fails, times out, or returns a rate-limit error before streaming begins, TOVA can retry the request against the next eligible provider. Developers configure fallback behavior per request.
fallback parameters
fallback— enable or disable automatic retrymax_retries— cap on the number of alternate providers attemptedtimeout_ms— request timeout before failover is triggered- next-best-provider strategy uses the same scoring model with the failing provider removed
- once output streaming has already started, the request cannot be safely failed over and surfaces the partial stream
{
"route": {
"objective": "balanced",
"fallback": true,
"max_retries": 2,
"timeout_ms": 15000
}
}evolution of the layer
The routing layer is designed to extend over time toward:
- permissioned third-party inference suppliers
- regional routing and SLA tiers
- capacity marketplace dynamics for surplus throughput
- dynamic pricing optimization across providers
- agent-driven inference budgets and task-level routing