~ / startup analyses / LLM Router Market Analysis


LLM Router Market Analysis

A full map of the LLM routing market as of March 2026: what the category is, every significant player, how they technically differ, where the money is going, and where the gaps still are.


2. 1. What LLM Routers Are and Why They Exist

An LLM router sits between an application and a pool of language models. When a request comes in, the router decides which model to send it to -- based on cost, latency, quality requirements, provider availability, or some combination. The application sends one request; the router figures out the rest.

The core problem is a pricing arbitrage opportunity that's hard to ignore. GPT-4o costs $2.50 per million input tokens. GPT-4o-mini costs $0.15 -- 16x cheaper. For a huge percentage of real-world requests (simple Q&A, classification, summarization of short documents), the mini model is good enough. Without routing, companies pay frontier prices for commodity queries. With routing, you can cut LLM spend by 40-85% without users noticing.

But cost is just the entry pitch. The actual product is broader:

ProblemWhat the Router DoesClaimed Improvement
LLM spend too highRoutes easy queries to cheap models40-85% cost reduction
Provider outagesAuto-failover to secondary provider in under 20msNear-zero downtime
Latency too highRoutes to fastest responding provider in real time~60% reduction in time-to-first-token
Repeated identical queriesSemantic caching returns cached response without LLM call96.9% latency reduction on cache hit
Long context re-computationPrompt caching reuses computed prefix stateUp to 90% cost reduction on long contexts
No visibility into LLM spendPer-user, per-team, per-key cost trackingFull cost attribution
Many providers, many APIsSingle OpenAI-compatible endpoint across 100-623 modelsOne integration, no provider lock-in

The market is young but moving fast. OpenRouter went from $10M annual inference run-rate in October 2024 to $100M+ ARR in May 2025 -- a 10x increase in seven months. The category exists, has paying customers, and has attracted serious VC money. The question is who wins.


3. 2. Players: Managed SaaS

OpenRouter

Funding$40M (Series A at $500M valuation, June 2025). Investors: a16z, Sequoia, Menlo Ventures.
Revenue$100M+ ARR (May 2025). Grew 10x in 7 months.
Users1M+ developers
Models623+
Pricing5% commission on inference spend. Pay-as-you-go. Free tier.
Positioning"One-stop shop marketplace for all AI models"

OpenRouter is the market leader by every metric that matters: funding, revenue, user count, model coverage. The 5% commission model is simple and trust-building -- you know exactly what you're paying on top of model costs. The moat is ecosystem: 623 models means any developer can find what they need without going anywhere else.

The risk for OpenRouter is commoditization. If every cloud provider ships a native router (Cloudflare, Vercel, AWS already are), the aggregation story weakens. The counter-play is that OpenRouter has the most model coverage and the biggest developer community, which are hard to replicate quickly.

Portkey

FundingNot disclosed
Models250+
PricingFree / $49 / $499 / $5K+/month
Positioning"Enterprise-grade AI Gateway and Production Control Plane"
DifferentiatorStrongest observability, SSO/SCIM/RBAC, hierarchical budget controls

Portkey is going after enterprise governance where OpenRouter goes after developer velocity. SSO, SCIM, RBAC, audit trails, virtual key management, canary deployments, circuit breakers -- this is the set of features a platform team at a 500-person company needs. The $5K+/month tier is a real enterprise play.

Portkey's core gateway is moving to open source with v2.0 in 2026. Classic open-core move: commoditize the gateway, monetize the governance layer.

Martian

Models200+
PricingDeveloper: $20/5,000 requests. Enterprise: custom.
PositioningML-based routing to the optimal model based on query complexity. Claims 20-96% cost reduction.
DifferentiatorLearns which model is actually best for each type of query, not just cheapest

Not Diamond

PricingContact sales (enterprise model)
PositioningUltra-low-latency routing (<100ms overhead). Pre-trained out-of-box router plus custom router training on your own data.
DistributionAvailable on AWS Marketplace
DifferentiatorStack-agnostic, integrates with existing evaluation metrics, custom router training

Requesty

Models400+
Pricing5% markup on model costs. Enterprise volume discounts.
PositioningUnified gateway with strong compliance story: data residency (Frankfurt, Virginia, Singapore), PII detection and redaction.
DifferentiatorPer-agent routing strategies, per-user spending caps, sub-20ms failover

Helicone

YC BatchW23
LicenseMIT (open source)
Models100+ providers
PricingFree (10K requests/month). Paid tiers undisclosed.
PositioningObservability-first, routing second. "See what's happening, then optimize it."

Helicone is interesting because it came in as an observability tool (like Langfuse) but added routing. The observability angle is a wedge: get teams to install it for visibility, then upsell routing and cost optimization. One-line integration (just change the base URL) keeps the friction close to zero.


4. 3. Players: Open Source / Self-Hosted

LiteLLM

Stars33,000+
LanguagePython
LicenseOpen source (self-hosted)
Models100+ providers
Routing strategiessimple-shuffle, least-busy, usage-based, latency-based, complexity router (sub-millisecond, zero external calls)

LiteLLM is the most adopted open source option by a wide margin. The complexity router is the interesting technical contribution: rule-based scoring of query complexity that makes routing decisions in under a millisecond without any external API call. That's the right tradeoff for high-throughput, latency-sensitive environments.

The weakness: LiteLLM is Python, which creates performance constraints under serious production load. Bifrost was built specifically to address this.

Bifrost (by Maxim AI)

LanguageGo
LicenseOpen source (self-hosted)
Performance11 microseconds overhead at 5,000 RPS. Claims 50x faster than LiteLLM.
Models1,000+ (15+ providers including OpenAI, Anthropic, AWS Bedrock, Google Vertex)
Key featuresTwo-layer semantic caching (exact hash + vector similarity), adaptive load balancing, RBAC, zero external dependencies, cluster mode

Bifrost is the performance play. Go vs. Python at the proxy layer is not a subtle difference -- 11 microseconds vs. multiple milliseconds matters when you're handling millions of requests per day. No external dependencies is also important for enterprise ops teams who don't want surprise failure modes. If you need self-hosted and you care about throughput, Bifrost is the pick.

RouteLLM (LMSYS / UC Berkeley)

LicenseOpen source (all code and datasets public on HuggingFace)
OriginAcademic research framework from LMSYS (the Chatbot Arena team)
Model pairOptimized for GPT-4 Turbo and Mixtral 8x7B routing
Performance95% GPT-4 quality with only 26% GPT-4 calls (48% cheaper than random baseline). On MT Bench: 85% cost reduction.
Available routerssw_ranking, BERT classifier, causal LLM classifier, matrix factorization

RouteLLM is a research framework, not a production product. Its value is benchmarking: it established the evaluation methodology that the whole field now uses. The limitation is that it's optimized for a single model pair and requires adaptation for any other scenario. It's the academic baseline, not the production deployment.

vLLM Semantic Router (v0.1 "Iris", January 2026)

LicenseOpen source
Contributors50+ engineers from Red Hat, IBM Research, AMD, Hugging Face
PositioningSystem-level intelligent router for Mixture-of-Models. Handles stateful conversation management, tool filtering for agentic workflows.
FeaturesOpenAI Responses API support, Signal-Decision Plugin Chain, HaluGate (hallucination detection), modular LoRA, Helm charts

The vLLM router is significant because it's coming from the inference layer up, not the API gateway layer down. This is mixture-of-models thinking: many small specialized models working together, routed intelligently, outperforming a single large model. As specialized models proliferate (reasoning models, math models, code models, vision models), this architecture becomes more interesting.

BricksLLM

YC BatchYes (BricksAI)
LanguageGo
LicenseOpen source + optional managed dashboard
DifferentiatorFine-grained cost and rate limiting per API key. Per-user, per-app, per-environment spending limits. PII detection and masking.
ModelsOpenAI, Azure OpenAI, Anthropic, vLLM, open-source LLMs

5. 4. Players: Infrastructure Extensions

These are companies that aren't primarily LLM routers but have added routing as part of a broader infrastructure play.

PlayerCore ProductRouter AngleTarget
Cloudflare AI GatewayCDN / edge network (20% of internet traffic)350+ models, caching, rate limiting, analytics at the edge. Global auto-scaling included.Teams already on Cloudflare
Vercel AI GatewayFrontend hosting (Next.js)Sub-20ms routing, 100+ models, tight Next.js integrationJavaScript/TypeScript frontend teams
Kong AI GatewayEnterprise API gatewayExtends Kong's platform to AI traffic. Most sophisticated semantic routing of any gateway product.Enterprises with existing Kong investment
Anyscale / Ray ServeDistributed compute platformPrefix-aware routing, achieves 60% TTFT reduction. Custom routing for Ray-managed infra.Teams running their own inference infrastructure
AWS BedrockCloud AI serviceModel routing within the AWS ecosystem, semantic routing supportAWS-native enterprises

The infrastructure player threat is real. Cloudflare and Vercel have distribution that pure-play routers can't match -- they're already in the request path for millions of applications. Routing becomes a checkbox feature, not a standalone purchase. This is the commoditization risk the pure-plays face.


6. 5. Technical Approaches: How Routing Actually Works

Not all routing is the same. The technical approach determines latency, accuracy, and maintenance cost.

StrategyHow It WorksLatency AddedAccuracyWho Uses It
Rule-based complexity scoringHeuristics on query length, token count, keyword presence. No model inference.<1msModerateLiteLLM complexity router, Bifrost
ML classifier (BERT)Fine-tuned BERT predicts which model will perform better20-50msHighRouteLLM, Not Diamond
Semantic embeddingEmbed the prompt, similarity-match against reference clusters, route to cluster's best model50-100msHigh for in-distributionvLLM Semantic Router, Kong, AWS
LLM-as-routerA small LLM reads the query and decides which larger LLM to call200-500msVery highMartian, some experimental setups
Matrix factorizationCollaborative-filtering style: predict which model the query is "closest to" based on historical performance vectors10-30msHighRouteLLM
Latency-basedReal-time probe of provider response times, route to fastest<20msN/A (not about quality)Almost all gateways
Load balancingDistribute across instances by weight, round-robin, or least-busy<10msN/A (not about quality)All gateways
Fallback chainsPrimary fails, try secondary, then tertiaryDepends on failure speedN/A (reliability play)All gateways

Caching as a Routing Bypass

The fastest "routing" decision is no LLM call at all. Three caching layers exist:

  • Exact-match caching: Hash the prompt, return cached response if identical. Instant. Works for repeated identical queries.
  • Prompt prefix caching: Anthropic and OpenAI now support computing prefix state once and reusing it. Up to 90% cost reduction on long-context queries with repeated system prompts.
  • Semantic caching: Vector-embed the prompt, similarity-search against cached responses, return cache hit if semantically close enough. 96.9% latency reduction on hits. Bifrost implements this as a two-layer system (exact hash first, then vector similarity).

Research Findings on Router Fragility

ACL 2026 research found that current routers have structural problems:

  • Routing collapse: As budget increases, routers default to expensive models even when cheaper ones suffice. The router "plays it safe" instead of optimizing.
  • Training/decision mismatch: Routers trained to predict model performance don't optimize the same objective as routing decisions (discrete ranking). The loss function is wrong.
  • Safety gap: BERT-based routers route potential jailbreaks to weaker models because the weak model scores lower on quality -- but lower quality doesn't mean safer. The opposite is often true.

These are known problems with no commercial solution yet.


7. 6. Pricing Models Compared

PlayerPricing ModelEntry CostAt Scale
OpenRouter5% commission on inference spendFreeScales with usage. At $10K/month model spend = $500/month.
Requesty5% markup on model costsFreeSame math as OpenRouter. Volume discounts at enterprise.
MartianPer-request (developer) + custom (enterprise)$20 per 5,000 requests after free tierCustom enterprise pricing. VPC deployment option.
Not DiamondEnterprise (contact sales)ContactPer-inference cost billing for model runs
PortkeyFreemium + seat-based tiers$49/month (Starter)$499/month (Pro, 20-person team), $5K+/month (Enterprise)
HeliconeFreemiumFree (10K req/month)Paid tiers undisclosed
LiteLLMFree (self-hosted)$0$0 (you pay for your own infra)
BifrostFree (self-hosted)$0$0
BricksLLMFree (open source) + optional managed dashboard$0Managed dashboard pricing undisclosed
RouteLLMFree (open source research framework)$0$0
Cloudflare AI GatewayIncluded with Cloudflare Workers$0 (on free plan)Scales with Cloudflare account tier

The 5% commission model (OpenRouter, Requesty) is the cleanest. You know exactly what you're paying. Percentage-of-spend aligns the router's incentives with the customer's: if model costs drop, the fee drops. The risk for the provider is that as models get cheaper industry-wide, the commission shrinks.

The self-hosted open source options (LiteLLM, Bifrost) have $0 sticker price but real costs: engineering time to maintain, infrastructure to run, on-call responsibility. For teams with a platform engineer, it's a good deal. For a two-person startup, it probably isn't.


8. 7. Who Buys and How They Decide

Buyer Segments

SegmentProfileWhat They BuyDecision DriverBudget
Developer / indie hackerSolo or tiny team, moving fast, no opsOpenRouter, Helicone (free tiers)Zero friction, free tier, works in 5 minutes$0-$100/month
Cost-conscious startup5-30 people, LLM spend becoming a line itemRequesty, Martian, Portkey StarterROI: "Show me the cost reduction"$100-$1K/month
Platform / infra team50+ person org with dedicated DevOps or platform engineeringLiteLLM, Bifrost, BricksLLM (self-hosted)Control, on-prem, custom governance, no vendor dependencyEngineering time, not SaaS fees
Enterprise with governance needsFortune 500, regulated industries (finance, healthcare, legal)Portkey Enterprise, Not Diamond, Kong AI GatewayCompliance, audit trails, RBAC, data residency, SSO$5K-$100K+/month
Cloudflare / Vercel native teamFrontend-heavy, already on these platformsCloudflare AI Gateway, Vercel AI GatewayAlready in the stack, zero new vendorBundled with existing spend

How Decisions Actually Get Made

The LLM router purchase is almost always developer-initiated, not top-down. An engineer discovers the cost savings opportunity, tests a free tier over a weekend, shows the boss a 60% reduction in the LLM bill, and gets budget approved. The sales motion is product-led. The best routers optimize for this: one-line integration (just change the base URL), immediate ROI visibility in a dashboard, and no credit card to start.

For enterprise deals, the evaluation criteria shift toward compliance and governance. "Does your data stay in the EU?" is more important than "how much does it cost?" for a German bank building on LLMs. Portkey and Requesty are positioning here; most others aren't.


9. 8. Funding and Market Size

CompanyRoundAmountDateValuation
OpenRouterSeed + Series A$40M totalJune 2025$500M
MartianUndisclosed------
Not DiamondSeedUndisclosed~2023--
HeliconeYC W23--2023--
BricksAIYC--~2024--

Outside of OpenRouter, the funding picture is thin. Most pure-play routers are either bootstrapped, seed-stage, or YC-backed at early valuations. The VC money in the broader AI infra space ($750M to Groq, $500M+ to Together AI) is going to inference infrastructure, not routing middleware. Routing is seen as a layer on top of inference, not the primary bet.

The market size estimates are optimistic: $6.52 billion by 2030 at 21% CAGR. Whether that's the standalone router market or the broader AI gateway market is unclear. OpenRouter's $100M ARR is the only hard data point. The rest is projection.


10. 9. Market Gaps: What Doesn't Exist Yet

GapCurrent StateOpportunityDifficulty
Capability-aware routingRouters optimize for cost/latency/quality generics. No router knows that model X is specifically good at multi-step reasoning or multilingual tasks.Build a capability profile for every model (benchmarks + production signals), then route based on task type, not just query complexity.High (requires ongoing model evaluation across many dimensions)
Safety-aware routingBERT routers route potential jailbreaks to weaker models (because weak = cheap), which elevates risk. Nobody has built routing that treats safety as a first-class routing signal.Router that classifies intent first (safe vs. borderline vs. risky), then routes safe to cheap and risky to either more capable or a safety-specialized model.High (requires intent classification without over-filtering legitimate queries)
Stateful multi-turn routingAlmost all routers make per-request decisions. Conversation context is ignored.Route entire conversations to the same model for consistency, or adapt routing mid-conversation based on where the conversation goes. vLLM SR is starting here.Medium (stateful session tracking at scale is solved infrastructure, the routing logic is novel)
MCP-aware routingAs of March 2026, no major router supports Model Context Protocol for tool-rich agentic workflows. Portkey and TrueFoundry are working on it.Router that understands tool context (which tools are available, which tools the query needs) and routes to the model best suited for those specific tools.Medium (MCP is new; the routing logic on top is non-trivial but buildable)
Compliance-driven routingData residency is addressed (Requesty has EU/US/APAC regions). Model-level compliance (HIPAA-eligible models, SOC2-certified providers, EU AI Act compliant models) is not.Router that enforces routing constraints based on regulatory requirements: HIPAA queries only go to HIPAA-eligible model endpoints, EU user data never leaves EU models.Medium (requires a maintained compliance attribute database for models, then rule enforcement)
Self-improving / adaptive routersAll current routers are static. A decision made in month 1 uses the same routing logic as month 12, regardless of what the production data shows.Router that continuously retrains on production performance data. If model X keeps producing lower-quality outputs than predicted for a certain query type, update the routing weights.Very High (requires a feedback loop, human-in-the-loop or LLM-as-judge, and a retraining pipeline)
Multimodal routingAll major routers handle text. Routing across image, video, audio, and multimodal models is not addressed.Route based on modality requirements of the query. Image question? Route to vision model. Audio transcription? Route to Whisper-class model. Mixed? Route to GPT-4V or Gemini Ultra.Medium (modality detection is straightforward; model capability databases for multimodal are less mature)
Standardized evaluation / leaderboardRouterArena project emerging but not yet the authoritative benchmark. Every company uses its own evaluation methodology to claim cost savings.The "HuggingFace Open LLM Leaderboard" for routers. Standardized tasks, standard model pairs, reproducible benchmarks. Whoever builds this becomes a reference point the whole market cites.Low-Medium (research project, not a product)
Custom fine-tuned model routingRouters are good at routing between public frontier models. Routing that includes custom fine-tunes or LoRA adapters is not well supported.Router that benchmarks custom models, integrates them into the routing pool, and selects between public models and custom fine-tunes based on task fit.Medium-High (requires a bring-your-own-model evaluation pipeline)

11. 10. Verdict: Where the Market Is Going

A few things are becoming clear.

OpenRouter has won the developer market. $100M ARR with 1M developers and the largest model catalog is a durable position. The 5% commission model is simple and trusted. Competing on pure model aggregation against OpenRouter is not a good idea.

Enterprise governance is still open. Portkey is going after it but is still relatively small. The enterprise AI governance market (compliance, chargeback, RBAC, audit trails) is worth a lot more than the developer-tier market, and it's less crowded. The play is to build the SOC2, HIPAA, and EU AI Act compliance story that Portkey doesn't have fully yet.

Infrastructure players will commoditize the basics. Cloudflare, Vercel, AWS, Azure, Kong -- they're all adding routing as a checkbox feature. Basic cost/latency routing with fallback is going to be free and built-in within 2-3 years. The pure-plays need to go up-market (governance, compliance, adaptive learning) or specialize vertically.

Self-hosted is not going away. Legal, healthcare, finance, and government will always have segments that can't send data to a managed service. Bifrost (Go, 11 microsecond overhead, no external dependencies) is the right technical approach for this segment. The opportunity there is to add enterprise governance on top of the raw performance.

The next battleground is agentic routing. Routing a single user query is solved. Routing across a multi-step AI agent workflow -- where context accumulates, tools get invoked, and the right model for step 3 depends on what happened in steps 1 and 2 -- is not solved. vLLM Semantic Router is the only player seriously attacking this. As AI agents move from demos to production, stateful multi-turn routing becomes the thing everyone needs.

The data flywheel hasn't kicked in yet. Every router that operates as a managed service accumulates a dataset of queries, model selections, and outcome quality. None of them appear to be using this data to train proprietary routing models. The first router to close this loop -- using production data to continuously improve routing decisions -- has a compounding moat that rule-based competitors can't close.

Short summary: buy OpenRouter for developer use, self-host Bifrost for performance-critical or privacy-sensitive workloads, watch the agentic routing space carefully, and ignore the "we save you X% on LLM costs" marketing from every player -- they all say the same thing.