introduction

overview

TOVA — Tokenized Open Vector Access — is an OpenAI-compatible inference gateway with tokenized credit settlement. One endpoint, one credit balance, major frontier and open-source models through supported providers.

TOVA sits between applications and multiple AI inference providers. Developers send requests to one API endpoint, and TOVA handles provider selection, authentication, credit accounting, usage tracking, and response streaming. The same surface works for chat completions, embeddings, and streaming output.

info

One OpenAI-compatible endpoint. One credit balance. Credits are purchased with the TOVA token, and every token used for a credit purchase is permanently burned on-chain.

how it works

API capacity is initially sourced through pre-funded provider accounts and private infrastructure agreements, with future support for permissioned third-party inference suppliers
users connect a wallet and purchase TOVA Credits using the TOVA token
tokens spent on credit purchases are permanently burned from circulating supply
credits are issued to the user account and consumed per request based on token usage and routing cost
API keys are managed from the user dashboard and authenticate every request against the credit ledger

the fragmentation problem

A single model family today is sold by a dozen providers at radically different price points and latencies. The pain shows up on both sides of the market — for developers integrating inference, and for providers with capacity that is not evenly utilized.

demand-side fragmentation

multiple API keys, SDKs, and billing systems to maintain
per-provider rate limits and quota management
inconsistent model naming and parameter conventions
manual failover during outages and degraded performance
price and latency differences that change week to week

supply-side inefficiency

Provider capacity is not evenly used. Some providers have surplus throughput at off-peak hours, some offer lower per-token pricing for specific model classes, and some have better latency depending on region or workload. There is no shared routing layer matching this supply to live demand. TOVA is designed to become a unified routing layer for this market.

price vs latency across providers ($/M tokens, ms)● illustrative benchmark

Illustrative benchmark: same class of model across eight providers. Cost spreads exceed 50× while latency varies roughly 3×.

tova in one paragraph

TOVA sits between your application and the network of inference providers. You write to one OpenAI-compatible endpoint and spend one balance — TOVA Credits. The routing engine scores every eligible provider on price per input token, price per output token, latency to first token, tokens per second, recent error rate, provider uptime, available capacity, and model compatibility.

The selected provider is the lowest-cost source that satisfies the request's latency, reliability, and model constraints. Per-request route hints can shift the weighting toward cheapest or fastest without changing the underlying API.

what you get

one OpenAI-compatible endpoint across supported providers
one credit balance — no per-provider accounts or billing
cost and latency optimization with per-request routing policies
fallback logic for provider failures and rate limits
tokenized credit settlement: every credit purchase burns TOVA supply

projected cost per million tokens● modeled projection

Modeled projection: a stack manually pinned to one provider vs. TOVA routed across eligible providers.

why tova exists