LLM Router Market Analysis

A full map of the LLM routing market as of March 2026: what the category is, every significant player, how they technically differ, where the money is going, and where the gaps still are.

2. 1. What LLM Routers Are and Why They Exist

An LLM router sits between an application and a pool of language models. When a request comes in, the router decides which model to send it to -- based on cost, latency, quality requirements, provider availability, or some combination. The application sends one request; the router figures out the rest.

The core problem is a pricing arbitrage opportunity that's hard to ignore. GPT-4o costs $2.50 per million input tokens. GPT-4o-mini costs $0.15 -- 16x cheaper. For a huge percentage of real-world requests (simple Q&A, classification, summarization of short documents), the mini model is good enough. Without routing, companies pay frontier prices for commodity queries. With routing, you can cut LLM spend by 40-85% without users noticing.

But cost is just the entry pitch. The actual product is broader:

Problem	What the Router Does	Claimed Improvement
LLM spend too high	Routes easy queries to cheap models	40-85% cost reduction
Provider outages	Auto-failover to secondary provider in under 20ms	Near-zero downtime
Latency too high	Routes to fastest responding provider in real time	~60% reduction in time-to-first-token
Repeated identical queries	Semantic caching returns cached response without LLM call	96.9% latency reduction on cache hit
Long context re-computation	Prompt caching reuses computed prefix state	Up to 90% cost reduction on long contexts
No visibility into LLM spend	Per-user, per-team, per-key cost tracking	Full cost attribution
Many providers, many APIs	Single OpenAI-compatible endpoint across 100-623 models	One integration, no provider lock-in

The market is young but moving fast. OpenRouter went from $10M annual inference run-rate in October 2024 to $100M+ ARR in May 2025 -- a 10x increase in seven months. The category exists, has paying customers, and has attracted serious VC money. The question is who wins.

3. 2. Players: Managed SaaS

OpenRouter

Funding	$40M (Series A at $500M valuation, June 2025). Investors: a16z, Sequoia, Menlo Ventures.
Revenue	$100M+ ARR (May 2025). Grew 10x in 7 months.
Users	1M+ developers
Models	623+
Pricing	5% commission on inference spend. Pay-as-you-go. Free tier.
Positioning	"One-stop shop marketplace for all AI models"

OpenRouter is the market leader by every metric that matters: funding, revenue, user count, model coverage. The 5% commission model is simple and trust-building -- you know exactly what you're paying on top of model costs. The moat is ecosystem: 623 models means any developer can find what they need without going anywhere else.

The risk for OpenRouter is commoditization. If every cloud provider ships a native router (Cloudflare, Vercel, AWS already are), the aggregation story weakens. The counter-play is that OpenRouter has the most model coverage and the biggest developer community, which are hard to replicate quickly.

Portkey

Funding	Not disclosed
Models	250+
Pricing	Free / $49 / $499 / $5K+/month
Positioning	"Enterprise-grade AI Gateway and Production Control Plane"
Differentiator	Strongest observability, SSO/SCIM/RBAC, hierarchical budget controls

Portkey is going after enterprise governance where OpenRouter goes after developer velocity. SSO, SCIM, RBAC, audit trails, virtual key management, canary deployments, circuit breakers -- this is the set of features a platform team at a 500-person company needs. The $5K+/month tier is a real enterprise play.

Portkey's core gateway is moving to open source with v2.0 in 2026. Classic open-core move: commoditize the gateway, monetize the governance layer.

Martian

Models	200+
Pricing	Developer: $20/5,000 requests. Enterprise: custom.
Positioning	ML-based routing to the optimal model based on query complexity. Claims 20-96% cost reduction.
Differentiator	Learns which model is actually best for each type of query, not just cheapest

Not Diamond

Pricing	Contact sales (enterprise model)
Positioning	Ultra-low-latency routing (<100ms overhead). Pre-trained out-of-box router plus custom router training on your own data.
Distribution	Available on AWS Marketplace
Differentiator	Stack-agnostic, integrates with existing evaluation metrics, custom router training

Requesty

Models	400+
Pricing	5% markup on model costs. Enterprise volume discounts.
Positioning	Unified gateway with strong compliance story: data residency (Frankfurt, Virginia, Singapore), PII detection and redaction.
Differentiator	Per-agent routing strategies, per-user spending caps, sub-20ms failover

Helicone

YC Batch	W23
License	MIT (open source)
Models	100+ providers
Pricing	Free (10K requests/month). Paid tiers undisclosed.
Positioning	Observability-first, routing second. "See what's happening, then optimize it."

Helicone is interesting because it came in as an observability tool (like Langfuse) but added routing. The observability angle is a wedge: get teams to install it for visibility, then upsell routing and cost optimization. One-line integration (just change the base URL) keeps the friction close to zero.

4. 3. Players: Open Source / Self-Hosted

LiteLLM

Stars	33,000+
Language	Python
License	Open source (self-hosted)
Models	100+ providers
Routing strategies	simple-shuffle, least-busy, usage-based, latency-based, complexity router (sub-millisecond, zero external calls)

LiteLLM is the most adopted open source option by a wide margin. The complexity router is the interesting technical contribution: rule-based scoring of query complexity that makes routing decisions in under a millisecond without any external API call. That's the right tradeoff for high-throughput, latency-sensitive environments.

The weakness: LiteLLM is Python, which creates performance constraints under serious production load. Bifrost was built specifically to address this.

Bifrost (by Maxim AI)

Language	Go
License	Open source (self-hosted)
Performance	11 microseconds overhead at 5,000 RPS. Claims 50x faster than LiteLLM.
Models	1,000+ (15+ providers including OpenAI, Anthropic, AWS Bedrock, Google Vertex)
Key features	Two-layer semantic caching (exact hash + vector similarity), adaptive load balancing, RBAC, zero external dependencies, cluster mode

Bifrost is the performance play. Go vs. Python at the proxy layer is not a subtle difference -- 11 microseconds vs. multiple milliseconds matters when you're handling millions of requests per day. No external dependencies is also important for enterprise ops teams who don't want surprise failure modes. If you need self-hosted and you care about throughput, Bifrost is the pick.

RouteLLM (LMSYS / UC Berkeley)

License	Open source (all code and datasets public on HuggingFace)
Origin	Academic research framework from LMSYS (the Chatbot Arena team)
Model pair	Optimized for GPT-4 Turbo and Mixtral 8x7B routing
Performance	95% GPT-4 quality with only 26% GPT-4 calls (48% cheaper than random baseline). On MT Bench: 85% cost reduction.
Available routers	sw_ranking, BERT classifier, causal LLM classifier, matrix factorization

RouteLLM is a research framework, not a production product. Its value is benchmarking: it established the evaluation methodology that the whole field now uses. The limitation is that it's optimized for a single model pair and requires adaptation for any other scenario. It's the academic baseline, not the production deployment.

vLLM Semantic Router (v0.1 "Iris", January 2026)

License	Open source
Contributors	50+ engineers from Red Hat, IBM Research, AMD, Hugging Face
Positioning	System-level intelligent router for Mixture-of-Models. Handles stateful conversation management, tool filtering for agentic workflows.
Features	OpenAI Responses API support, Signal-Decision Plugin Chain, HaluGate (hallucination detection), modular LoRA, Helm charts

The vLLM router is significant because it's coming from the inference layer up, not the API gateway layer down. This is mixture-of-models thinking: many small specialized models working together, routed intelligently, outperforming a single large model. As specialized models proliferate (reasoning models, math models, code models, vision models), this architecture becomes more interesting.

BricksLLM

YC Batch	Yes (BricksAI)
Language	Go
License	Open source + optional managed dashboard
Differentiator	Fine-grained cost and rate limiting per API key. Per-user, per-app, per-environment spending limits. PII detection and masking.
Models	OpenAI, Azure OpenAI, Anthropic, vLLM, open-source LLMs

5. 4. Players: Infrastructure Extensions

These are companies that aren't primarily LLM routers but have added routing as part of a broader infrastructure play.

Player	Core Product	Router Angle	Target
Cloudflare AI Gateway	CDN / edge network (20% of internet traffic)	350+ models, caching, rate limiting, analytics at the edge. Global auto-scaling included.	Teams already on Cloudflare
Vercel AI Gateway	Frontend hosting (Next.js)	Sub-20ms routing, 100+ models, tight Next.js integration	JavaScript/TypeScript frontend teams
Kong AI Gateway	Enterprise API gateway	Extends Kong's platform to AI traffic. Most sophisticated semantic routing of any gateway product.	Enterprises with existing Kong investment
Anyscale / Ray Serve	Distributed compute platform	Prefix-aware routing, achieves 60% TTFT reduction. Custom routing for Ray-managed infra.	Teams running their own inference infrastructure
AWS Bedrock	Cloud AI service	Model routing within the AWS ecosystem, semantic routing support	AWS-native enterprises

The infrastructure player threat is real. Cloudflare and Vercel have distribution that pure-play routers can't match -- they're already in the request path for millions of applications. Routing becomes a checkbox feature, not a standalone purchase. This is the commoditization risk the pure-plays face.

6. 5. Technical Approaches: How Routing Actually Works

Not all routing is the same. The technical approach determines latency, accuracy, and maintenance cost.

Strategy	How It Works	Latency Added	Accuracy	Who Uses It
Rule-based complexity scoring	Heuristics on query length, token count, keyword presence. No model inference.	<1ms	Moderate	LiteLLM complexity router, Bifrost
ML classifier (BERT)	Fine-tuned BERT predicts which model will perform better	20-50ms	High	RouteLLM, Not Diamond
Semantic embedding	Embed the prompt, similarity-match against reference clusters, route to cluster's best model	50-100ms	High for in-distribution	vLLM Semantic Router, Kong, AWS
LLM-as-router	A small LLM reads the query and decides which larger LLM to call	200-500ms	Very high	Martian, some experimental setups
Matrix factorization	Collaborative-filtering style: predict which model the query is "closest to" based on historical performance vectors	10-30ms	High	RouteLLM
Latency-based	Real-time probe of provider response times, route to fastest	<20ms	N/A (not about quality)	Almost all gateways
Load balancing	Distribute across instances by weight, round-robin, or least-busy	<10ms	N/A (not about quality)	All gateways
Fallback chains	Primary fails, try secondary, then tertiary	Depends on failure speed	N/A (reliability play)	All gateways

Caching as a Routing Bypass

The fastest "routing" decision is no LLM call at all. Three caching layers exist:

Exact-match caching: Hash the prompt, return cached response if identical. Instant. Works for repeated identical queries.
Prompt prefix caching: Anthropic and OpenAI now support computing prefix state once and reusing it. Up to 90% cost reduction on long-context queries with repeated system prompts.
Semantic caching: Vector-embed the prompt, similarity-search against cached responses, return cache hit if semantically close enough. 96.9% latency reduction on hits. Bifrost implements this as a two-layer system (exact hash first, then vector similarity).

Research Findings on Router Fragility

ACL 2026 research found that current routers have structural problems:

Routing collapse: As budget increases, routers default to expensive models even when cheaper ones suffice. The router "plays it safe" instead of optimizing.
Training/decision mismatch: Routers trained to predict model performance don't optimize the same objective as routing decisions (discrete ranking). The loss function is wrong.
Safety gap: BERT-based routers route potential jailbreaks to weaker models because the weak model scores lower on quality -- but lower quality doesn't mean safer. The opposite is often true.

These are known problems with no commercial solution yet.

7. 6. Pricing Models Compared

Player	Pricing Model	Entry Cost	At Scale
OpenRouter	5% commission on inference spend	Free	Scales with usage. At $10K/month model spend = $500/month.
Requesty	5% markup on model costs	Free	Same math as OpenRouter. Volume discounts at enterprise.
Martian	Per-request (developer) + custom (enterprise)	$20 per 5,000 requests after free tier	Custom enterprise pricing. VPC deployment option.
Not Diamond	Enterprise (contact sales)	Contact	Per-inference cost billing for model runs
Portkey	Freemium + seat-based tiers	$49/month (Starter)	$499/month (Pro, 20-person team), $5K+/month (Enterprise)
Helicone	Freemium	Free (10K req/month)	Paid tiers undisclosed
LiteLLM	Free (self-hosted)	$0	$0 (you pay for your own infra)
Bifrost	Free (self-hosted)	$0	$0
BricksLLM	Free (open source) + optional managed dashboard	$0	Managed dashboard pricing undisclosed
RouteLLM	Free (open source research framework)	$0	$0
Cloudflare AI Gateway	Included with Cloudflare Workers	$0 (on free plan)	Scales with Cloudflare account tier

The 5% commission model (OpenRouter, Requesty) is the cleanest. You know exactly what you're paying. Percentage-of-spend aligns the router's incentives with the customer's: if model costs drop, the fee drops. The risk for the provider is that as models get cheaper industry-wide, the commission shrinks.

The self-hosted open source options (LiteLLM, Bifrost) have $0 sticker price but real costs: engineering time to maintain, infrastructure to run, on-call responsibility. For teams with a platform engineer, it's a good deal. For a two-person startup, it probably isn't.

8. 7. Who Buys and How They Decide

Buyer Segments

Segment	Profile	What They Buy	Decision Driver	Budget
Developer / indie hacker	Solo or tiny team, moving fast, no ops	OpenRouter, Helicone (free tiers)	Zero friction, free tier, works in 5 minutes	$0-$100/month
Cost-conscious startup	5-30 people, LLM spend becoming a line item	Requesty, Martian, Portkey Starter	ROI: "Show me the cost reduction"	$100-$1K/month
Platform / infra team	50+ person org with dedicated DevOps or platform engineering	LiteLLM, Bifrost, BricksLLM (self-hosted)	Control, on-prem, custom governance, no vendor dependency	Engineering time, not SaaS fees
Enterprise with governance needs	Fortune 500, regulated industries (finance, healthcare, legal)	Portkey Enterprise, Not Diamond, Kong AI Gateway	Compliance, audit trails, RBAC, data residency, SSO	$5K-$100K+/month
Cloudflare / Vercel native team	Frontend-heavy, already on these platforms	Cloudflare AI Gateway, Vercel AI Gateway	Already in the stack, zero new vendor	Bundled with existing spend

How Decisions Actually Get Made

The LLM router purchase is almost always developer-initiated, not top-down. An engineer discovers the cost savings opportunity, tests a free tier over a weekend, shows the boss a 60% reduction in the LLM bill, and gets budget approved. The sales motion is product-led. The best routers optimize for this: one-line integration (just change the base URL), immediate ROI visibility in a dashboard, and no credit card to start.

For enterprise deals, the evaluation criteria shift toward compliance and governance. "Does your data stay in the EU?" is more important than "how much does it cost?" for a German bank building on LLMs. Portkey and Requesty are positioning here; most others aren't.

9. 8. Funding and Market Size

Company	Round	Amount	Date	Valuation
OpenRouter	Seed + Series A	$40M total	June 2025	$500M
Martian	Undisclosed	--	--	--
Not Diamond	Seed	Undisclosed	~2023	--
Helicone	YC W23	--	2023	--
BricksAI	YC	--	~2024	--

Outside of OpenRouter, the funding picture is thin. Most pure-play routers are either bootstrapped, seed-stage, or YC-backed at early valuations. The VC money in the broader AI infra space ($750M to Groq, $500M+ to Together AI) is going to inference infrastructure, not routing middleware. Routing is seen as a layer on top of inference, not the primary bet.

The market size estimates are optimistic: $6.52 billion by 2030 at 21% CAGR. Whether that's the standalone router market or the broader AI gateway market is unclear. OpenRouter's $100M ARR is the only hard data point. The rest is projection.

10. 9. Market Gaps: What Doesn't Exist Yet

Gap	Current State	Opportunity	Difficulty
Capability-aware routing	Routers optimize for cost/latency/quality generics. No router knows that model X is specifically good at multi-step reasoning or multilingual tasks.	Build a capability profile for every model (benchmarks + production signals), then route based on task type, not just query complexity.	High (requires ongoing model evaluation across many dimensions)
Safety-aware routing	BERT routers route potential jailbreaks to weaker models (because weak = cheap), which elevates risk. Nobody has built routing that treats safety as a first-class routing signal.	Router that classifies intent first (safe vs. borderline vs. risky), then routes safe to cheap and risky to either more capable or a safety-specialized model.	High (requires intent classification without over-filtering legitimate queries)
Stateful multi-turn routing	Almost all routers make per-request decisions. Conversation context is ignored.	Route entire conversations to the same model for consistency, or adapt routing mid-conversation based on where the conversation goes. vLLM SR is starting here.	Medium (stateful session tracking at scale is solved infrastructure, the routing logic is novel)
MCP-aware routing	As of March 2026, no major router supports Model Context Protocol for tool-rich agentic workflows. Portkey and TrueFoundry are working on it.	Router that understands tool context (which tools are available, which tools the query needs) and routes to the model best suited for those specific tools.	Medium (MCP is new; the routing logic on top is non-trivial but buildable)
Compliance-driven routing	Data residency is addressed (Requesty has EU/US/APAC regions). Model-level compliance (HIPAA-eligible models, SOC2-certified providers, EU AI Act compliant models) is not.	Router that enforces routing constraints based on regulatory requirements: HIPAA queries only go to HIPAA-eligible model endpoints, EU user data never leaves EU models.	Medium (requires a maintained compliance attribute database for models, then rule enforcement)
Self-improving / adaptive routers	All current routers are static. A decision made in month 1 uses the same routing logic as month 12, regardless of what the production data shows.	Router that continuously retrains on production performance data. If model X keeps producing lower-quality outputs than predicted for a certain query type, update the routing weights.	Very High (requires a feedback loop, human-in-the-loop or LLM-as-judge, and a retraining pipeline)
Multimodal routing	All major routers handle text. Routing across image, video, audio, and multimodal models is not addressed.	Route based on modality requirements of the query. Image question? Route to vision model. Audio transcription? Route to Whisper-class model. Mixed? Route to GPT-4V or Gemini Ultra.	Medium (modality detection is straightforward; model capability databases for multimodal are less mature)
Standardized evaluation / leaderboard	RouterArena project emerging but not yet the authoritative benchmark. Every company uses its own evaluation methodology to claim cost savings.	The "HuggingFace Open LLM Leaderboard" for routers. Standardized tasks, standard model pairs, reproducible benchmarks. Whoever builds this becomes a reference point the whole market cites.	Low-Medium (research project, not a product)
Custom fine-tuned model routing	Routers are good at routing between public frontier models. Routing that includes custom fine-tunes or LoRA adapters is not well supported.	Router that benchmarks custom models, integrates them into the routing pool, and selects between public models and custom fine-tunes based on task fit.	Medium-High (requires a bring-your-own-model evaluation pipeline)

11. 10. Verdict: Where the Market Is Going

A few things are becoming clear.

OpenRouter has won the developer market. $100M ARR with 1M developers and the largest model catalog is a durable position. The 5% commission model is simple and trusted. Competing on pure model aggregation against OpenRouter is not a good idea.

Enterprise governance is still open. Portkey is going after it but is still relatively small. The enterprise AI governance market (compliance, chargeback, RBAC, audit trails) is worth a lot more than the developer-tier market, and it's less crowded. The play is to build the SOC2, HIPAA, and EU AI Act compliance story that Portkey doesn't have fully yet.

Infrastructure players will commoditize the basics. Cloudflare, Vercel, AWS, Azure, Kong -- they're all adding routing as a checkbox feature. Basic cost/latency routing with fallback is going to be free and built-in within 2-3 years. The pure-plays need to go up-market (governance, compliance, adaptive learning) or specialize vertically.

Self-hosted is not going away. Legal, healthcare, finance, and government will always have segments that can't send data to a managed service. Bifrost (Go, 11 microsecond overhead, no external dependencies) is the right technical approach for this segment. The opportunity there is to add enterprise governance on top of the raw performance.

The next battleground is agentic routing. Routing a single user query is solved. Routing across a multi-step AI agent workflow -- where context accumulates, tools get invoked, and the right model for step 3 depends on what happened in steps 1 and 2 -- is not solved. vLLM Semantic Router is the only player seriously attacking this. As AI agents move from demos to production, stateful multi-turn routing becomes the thing everyone needs.

The data flywheel hasn't kicked in yet. Every router that operates as a managed service accumulates a dataset of queries, model selections, and outcome quality. None of them appear to be using this data to train proprietary routing models. The first router to close this loop -- using production data to continuously improve routing decisions -- has a compounding moat that rule-based competitors can't close.

Short summary: buy OpenRouter for developer use, self-host Bifrost for performance-critical or privacy-sensitive workloads, watch the agentic routing space carefully, and ignore the "we save you X% on LLM costs" marketing from every player -- they all say the same thing.