architecture

how tova works

TOVA is an OpenAI-compatible inference gateway with tokenized credit settlement. Every request flows through authentication, the credit ledger, the routing engine, a provider adapter, and the usage metering pipeline.

end-to-end flow

user connects a wallet and purchases TOVA Credits using the TOVA token
tokens used for the credit purchase are burned on-chain
credits are issued to the user account
user creates an API key from the dashboard
application sends an OpenAI-compatible request to TOVA
router evaluates eligible providers and selects the best fit
request is sent to the selected provider through the adapter layer
response streams back to the user
token usage and credit cost are calculated
credits are deducted and usage metadata is returned with the response

system architecture

The gateway is composed of distinct services with explicit interfaces between them.

Client Application
        ↓
TOVA API Gateway
        ↓
Authentication and Credit Ledger
        ↓
Routing Engine
        ↓
Provider Adapter Layer
        ↓
Inference Providers
        ↓
Response Stream and Usage Metering

API gateway

Accepts OpenAI-compatible requests, validates API keys, applies rate limits, and manages streaming responses back to the client.

authentication and credit ledger

Tracks per-account balances, estimates request cost before dispatch, deducts credits after usage, and ties each credit issuance back to its on-chain burn transaction.

routing engine

Scores eligible providers using price per input token, price per output token, latency, uptime, capacity, model support, and recent success rate. Route policy can be overridden per request.

provider adapter layer

Normalizes requests and responses across OpenAI, Anthropic, Groq, DeepSeek, Mistral, Together, Fireworks, and other supported providers. Provider-specific credentials never leave the gateway.

usage metering

Records input tokens, output tokens, provider used, credits spent, estimated underlying cost, and routing outcome. Metering data backs the dashboard, the response metadata, and the credit ledger.

request lifecycle

request enters the TOVA API gateway
API key and account balance are validated
router identifies providers that support the requested model or model class
providers are scored by cost, latency, uptime, capacity, and recent success rate
winning provider receives the normalized request from the adapter layer
response is streamed back through TOVA to the client
usage is calculated from input and output tokens
credits are deducted from the user balance
response metadata includes provider, route, token usage, and credit cost

end-to-end request timeline (ms)● modeled projection

Modeled lifecycle for a routed chat completion. Auth, ledger and route evaluation together add roughly 18 ms before the provider call.

example request

POST https://api.tova.fyi/v1/chat/completions
Authorization: Bearer tova_sk_***
Content-Type: application/json

{
  "model": "auto",
  "messages": [{ "role": "user", "content": "summarize this paper" }],
  "route": { "objective": "cheapest", "max_latency_ms": 800 }
}

info

Pass model: "auto" to let TOVA select the cheapest eligible model for the task. Pin a specific model and the router still picks the cheapest healthy provider that supports it.

← previous

why tova exists

routing layer