~ / startup analyses / Observability Market Dynamics: The $28B Consolidation, OpenTelemetry & the Cost Revolt


Observability Market Dynamics: The $28B Consolidation Wave, OpenTelemetry, the Cost Revolt & What Comes Next

The observability market is worth $28.5 billion (2025), projected to reach $164–172 billion by 2035 at 19–20% CAGR. In the past three years, over $40 billion in M&A has reshaped the landscape: Cisco bought Splunk for $28B, New Relic went private for $6.5B, Palo Alto Networks acquired Chronosphere for $3.35B, and Snowflake bought Observe for ~$1B. Meanwhile, OpenTelemetry is decoupling instrumentation from vendors, eBPF is enabling zero-code observability, and companies spending $65M–$170M annually on Datadog are revolting.

This analysis covers the full market dynamics: every major player with revenue and funding, the consolidation thesis, OpenTelemetry’s market-reshaping impact, the observability tax problem and migration economics, the eBPF revolution, AI/AIOps integration, the Grafana open source playbook, and where the real opportunities are.



2. Market Overview & Key Numbers

Observability Market at a Glance
Broader market (2025)$28.5 billion
Projected size (2035)$164–172 billion
CAGR19–20%
Core observability platform market$2.9–4.8B (narrower scope)
M&A in past 3 years$40B+ in deal value
Datadog revenue (FY2025)$3.43B (+28% YoY)
Grafana Labs ARR$400M+ (60%+ growth)
OpenTelemetry adoption~50% of CNCF end-user companies

Three forces are colliding: (1) consolidation — platform companies in security, networking, and data are acquiring observability startups, (2) commoditization — OpenTelemetry is making backends interchangeable, and (3) cost pressure — companies at scale are revolting against per-GB pricing that produces million-dollar annual bills. The result is a market that is simultaneously massive and under siege.


3. Revenue Landscape: Who’s Winning

Observability Companies by Revenue (2025)
CompanyRevenue / ARRGrowthMarket Cap / ValuationStatus
Splunk (Cisco)~$4B (pre-acquisition)Acquired for $28BIntegrated into Cisco
Datadog$3.43B+28% YoY~$43.9B market capPublic (DDOG)
Dynatrace$1.70B+18.75% YoY~$10–11B market capPublic (DT)
Elastic$1.48B+17% YoYPublic (ESTC)IDC MarketScape Leader 2025
New Relic~$1B ARR (est.)Acquired for $6.5BPrivate (Francisco Partners + TPG)
Grafana Labs$400M+ ARR+60% YoY$6B (raising at $9B)Private. 7,000+ customers
Bugcrowd$328.2M+40% YoY$1BPrivate
Chronosphere$160M+ ARRTriple-digit YoYAcquired for $3.35BPalo Alto Networks (Nov 2025)
Pentera~$100M ARR$1B+Private
HoneycombUndisclosed (2x in 2023)160%+ NRRPrivate ($150M raised)600+ customers
Logz.io$48MPrivate ($123M raised)800 customers, 200 employees
CoralogixUndisclosed$1B+ ($350M raised)Acquired Aporia for AI observability
AxiomUndisclosed (2x in 2025)Private ($59M raised)40,000+ organizations
Better Stack$3.4MPrivate ($28.9M raised)Profitable since 2024
Last9~$2.7MPrivate ($13M raised)21 employees

The Power Law

Datadog alone captures more revenue ($3.43B) than all open source and indie alternatives combined. With 31,400 customers and 4,310 generating over $100K ARR (+19% YoY), it dominates the enterprise. 31% of customers use 6+ products; 16% use 8+ products — deep platform lock-in. Yet its pricing is the single biggest driver of the open source revolt.


4. The $40B+ Consolidation Wave

Major Observability M&A (2023–2026)
AcquirerTargetValueYearThesis
CiscoSplunk$28B2024Networking + observability + security. Largest deal in Cisco history
Francisco Partners + TPGNew Relic$6.5B2023Take-private. Public market undervalued consumption model
IBMApptio$4.6B2023FinOps + observability cost management. $450B IT spend data
Palo Alto NetworksChronosphere$3.35B2025Security + observability convergence for AI era
SnowflakeObserve~$1B2026Data platform + observability convergence
ClickHouseHyperDXUndisclosed2025Open source full-stack observability on ClickHouse
ElasticJina AI + othersUndisclosed202513 total acquisitions. AI + search + observability

The Convergence Thesis

Three types of companies are acquiring observability startups:

  1. Security companies (Palo Alto Networks → Chronosphere). Security and observability share the same data: logs, network flows, endpoint telemetry. Converging them eliminates duplicate data pipelines and enables unified threat detection + performance monitoring.
  2. Networking/infrastructure companies (Cisco → Splunk). Network telemetry is observability data. Cisco now has Splunk + AppDynamics + ThousandEyes for full-stack visibility from network to application.
  3. Data platform companies (Snowflake → Observe, ClickHouse → HyperDX). Observability is a data problem. If you already store and query petabytes, adding observability is a product expansion, not a new capability.

The implication for startups: standalone observability companies are becoming acquisition targets rather than IPO candidates. Chronosphere ($3.35B), Observe (~$1B), and HyperDX were all acquired before reaching maturity as independent companies. The window for building a standalone observability business is narrowing.


5. OpenTelemetry: The Market Reshaper

OpenTelemetry Key Metrics
CNCF statusGraduated. Second-highest velocity CNCF project (after Kubernetes)
Contributors10,000+ individuals from 1,200+ companies
Monthly active contributors1,200+ developers (+18% YoY), 200+ companies (+22% YoY)
Code commits+45% YoY increase
Google search volume+100% increase (2024)
Adoption~50% of CNCF end-user companies; 85% investing in OTel
Largest contributorsSplunk (#1), Google, Red Hat, Microsoft, Amazon, Cisco, Uber

Why OTel Changes Everything

OpenTelemetry decouples instrumentation (how you collect telemetry) from backends (where you send it). This is the most important structural change in the observability market since the term was coined:

Vendor portability
Instrument once with OTel, send to any backend. Switch from Datadog to Grafana Cloud without re-instrumenting your applications. This shatters the traditional lock-in model that vendors relied on for retention.
Multi-vendor strategies
Route critical traces to Honeycomb for deep analysis, bulk logs to Loki for cost efficiency, and metrics to VictoriaMetrics for performance. The OTel Collector’s pipeline architecture (receivers → processors → exporters) makes this trivial.
Commoditization of collection
If every vendor supports the same instrumentation standard, the differentiation shifts from “how easy is it to get data in?” to “how good is your query engine, AI, and UX?”

SDK Status

SignalStatus
Tracing APIStable across all major languages (Java, Python, Go, JS, .NET, Ruby, PHP, Rust, C++, Swift, Erlang, Kotlin)
Metrics APIStable across all major languages
Logs specificationStable; individual language implementations vary
Spring Boot starterStable

The Implication

OTel is the kingmaker. With near-universal vendor support and 50% adoption, it makes observability backends interchangeable. This favors open source and cost-effective alternatives, because the switching cost drops from “re-instrument everything” to “change the exporter config.” Vendors that relied on proprietary agents for lock-in (Datadog’s dd-agent, New Relic’s agent) are losing their moat.


6. The Observability Tax Revolt

The Scale of the Problem

Known High-Spend Datadog Customers
OpenAI$170 million/year on Datadog (reported)
Coinbase$65 million on Datadog (reported)

Observability costs often grow faster than revenue or headcount. The dynamic: more microservices → more hosts → more logs/metrics/traces → exponentially higher bills. SaaS platforms charge per-GB ingested, per-host monitored, or per-custom-metric — creating a perverse incentive to reduce observability at exactly the moment you need more visibility.

Why Bills Explode

  • Container/microservice architectures multiply “host” counts vs. monolithic apps
  • Custom metrics and high-cardinality dimensions are major cost drivers
  • More visibility = more cost, which incentivizes teams to sample aggressively and drop data
  • Multi-dimensional pricing (per-host + per-GB + per-metric + per-span) makes bills unpredictable
  • Vendor pricing tiers change, often retroactively raising costs for existing usage patterns

Migration Economics

Conventional migration
~6 months, up to $200K in consulting fees. Re-instrumenting applications, rebuilding dashboards, migrating alerts
Modern migration (with OTel)
Days to weeks. Change the OTel Collector exporter config. Dashboards and alerts still need migration, but the hardest part (instrumentation) is portable
Typical savings from Datadog alternatives
60–80% cost reduction with managed alternatives; 90%+ savings with self-hosted ClickHouse-based solutions at multi-TB/day scale

7. Open Source Observability Stack

Open Source Observability Projects
ProjectGitHub StarsCNCF StatusFunctionBacking
Grafana~66KVisualization & dashboardsGrafana Labs (AGPLv3)
Prometheus~56KGraduated (2018)Metrics collection & alertingCNCF / Independent
ClickHouse~41KColumnar analytics databaseClickHouse Inc.
Jaeger~30.3KGraduated (2019)Distributed tracingOriginally Uber. v2 uses OTel core
Fluentd~30.7KGraduated (2019)Log collectionTreasure Data
SigNoz~24.2KFull-stack observability (OTel-native)$7M raised (YC-backed)
Grafana Loki~24KLog aggregationGrafana Labs (AGPLv3)
Vector~21.1KTelemetry pipeline (collection/routing)Datadog. 100K+ downloads/day
OpenObserve~15KFull-stack observability (Rust)3,600+ deployments. Claims 140x lower storage
VictoriaMetrics~14KMetrics (Prometheus-compatible)Self-funded. 300%+ growth in 2024. 1B+ Docker downloads
HyperDX~9.2KFull-stack observabilityAcquired by ClickHouse (Mar 2025)
Grafana Mimir~4.4KLong-term metrics storageGrafana Labs (AGPLv3)
Grafana Tempo~4KDistributed tracingGrafana Labs (AGPLv3)

VictoriaMetrics: The Self-Funded Dark Horse

VictoriaMetrics has achieved 300%+ growth in 2024, 1 billion+ Docker downloads, and Spotify as a customer — all with zero external funding. It’s Prometheus-compatible (drop-in replacement) with dramatically better performance and storage efficiency. This is the most impressive bootstrapping story in the observability space, proving that a focused, high-performance open source tool can compete against well-funded startups.


8. Case Study: The Grafana Playbook

Grafana Labs Key Metrics
ARR$400M+ (Sep 2025), up from $250M (Aug 2024) — 60%+ growth
Customers7,000+ (up from 5,000 in Aug 2024), including 70% of Fortune 50
Total funding$805M+
Valuation$6B (2024); reportedly raising at $9B (Feb 2026)
Employees~1,500 (+18% YoY)
Key investorsOntario Teachers’, Sapphire, Tiger Global, Lightspeed, GIC, Coatue, Sequoia, CapitalG
RecognitionForbes Cloud 100 (#13). Leader in 2025 Gartner Magic Quadrant
Notable AI customersAnthropic, Lovable, OutSystems

The LGTM Stack

ComponentPurposeStarsLicense
LokiLog aggregation~24KAGPLv3
GrafanaVisualization~66KAGPLv3
TempoDistributed tracing~4KAGPLv3
MimirLong-term metrics~4.4KAGPLv3
AlloyTelemetry collectorApache 2.0
BeylaeBPF auto-instrumentationApache 2.0 (donated to OTel)
PyroscopeContinuous profilingAGPLv3

The AGPLv3 Strategy

In 2021, Grafana Labs moved Grafana, Loki, and Tempo from Apache 2.0 to AGPLv3 to prevent cloud providers (AWS, Azure, GCP) from offering managed services without contributing back. Key nuances:

  • AGPLv3 is OSI-approved — still “true” open source (unlike BSL or SSPL)
  • Plugins, agents, and some libraries remain Apache 2.0
  • Grafana Enterprise and Cloud are commercially licensed
  • The license deters cloud providers from competing while keeping the community intact

Why It Works

  1. Grafana is the de facto visualization layer. Even Datadog and New Relic users often pipe data into Grafana for dashboards. This makes Grafana the “control plane” of observability
  2. Each component is best-of-breed. Loki is the cost leader for logs (indexes metadata only, not full text). Mimir scales Prometheus to petabytes. Tempo is the simplest tracing backend
  3. Self-hosted is free, Cloud is paid. Companies self-host to save money, then migrate to Grafana Cloud as they grow and operational overhead becomes a burden. Natural upsell path
  4. The AI wave benefits them. Anthropic, Lovable, and OutSystems are Grafana Cloud customers. AI companies generate massive telemetry (GPU metrics, inference traces, token-level logging) and need cost-effective observability

9. The ClickHouse Convergence

ClickHouse (41K GitHub stars) is emerging as the default storage engine for cost-conscious observability platforms. Its columnar architecture delivers millisecond query times on terabytes of log data at a fraction of Elasticsearch’s storage cost.

Who’s Building on ClickHouse

  • SigNoz (24.2K stars, $7M funding) — the “open source Datadog alternative,” OTel-native, built entirely on ClickHouse
  • HyperDX / ClickStack — acquired by ClickHouse in March 2025. Released as ClickStack (May 2025): single ClickHouse database for logs, metrics, traces, and session replays. SQL joins across all signal types
  • Uptrace — OpenTelemetry + ClickHouse-based observability
  • VictoriaLogs — VictoriaMetrics’ log solution (though not ClickHouse-based, competing in the same cost-efficient space)

Why ClickHouse Wins for Observability

Compression
Columnar storage compresses log data 10–20x better than row-oriented databases. TB/day becomes manageable
Query speed
Millisecond aggregation queries on billions of rows. The analytical workload of observability (group-by, percentiles, top-N) is ClickHouse’s sweet spot
SQL
Standard SQL (with extensions) instead of proprietary query languages. Lower learning curve. ClickStack enables SQL joins between logs and traces
Cost
90%+ savings vs. Elasticsearch at multi-TB/day scale. This is the single biggest driver of adoption

The signal: ClickHouse Inc.’s acquisition of HyperDX and release of ClickStack shows the database company is going all-in on observability as a first-class use case. This creates a gravitational pull for any new observability startup — why build your own storage when ClickHouse is purpose-built for your workload?


10. eBPF: Zero-Instrumentation Observability

eBPF (extended Berkeley Packet Filter) allows programs to run in the Linux kernel without modifying kernel source code. For observability, this means capturing requests, metrics, and traces without instrumenting applications — no SDK, no code changes, no restarts.

Why eBPF Matters

  • Zero code changes: capture HTTP requests, database queries, DNS lookups at the kernel level
  • Low overhead: kernel-level collection is more efficient than user-space agents
  • Language-agnostic: works regardless of programming language or runtime
  • Network visibility: deep packet inspection and network flow analysis for free

Key Players

CompanyFundingModelKey Facts
Groundcover$60M ($35M Series B, Apr 2025)BYOC (data stays in your cloud)500%+ ARR growth. Fortune 100 customers. Automated Datadog migration tool
Odigos$13M (Sep 2024)OSS + CommercialZero-code distributed tracing. OTel eBPF auto-instrumentation for Go. <5 min install
PixieAcquired by New Relic, donated to CNCFCNCF SandboxRuns inside K8s clusters. No data leaves the cluster
CorootCommunityOSS (Apache 2.0)eBPF-based APM + AI root cause analysis. Metrics, logs, traces, profiling, SLOs
Cilium / HubbleCNCF GraduatedNetwork-level K8s observability. Service maps, flow visibility, security

OpenTelemetry eBPF Instrumentation (OBI)

In 2025, the first alpha release of OBI was announced — a collaborative project based on Grafana Beyla, donated by Grafana Labs with contributions from Splunk, Coralogix, Odigos, and others. This represents the standardization of eBPF-based observability under the OpenTelemetry umbrella. When OBI matures, it will further commoditize data collection and make vendor differentiation entirely about the backend.


11. AI & AIOps in Observability

Every major vendor now offers AI-powered observability features. The differentiation lies in whether AI is truly embedded in the core product or bolted on as an assistant layer.

AI in Observability: Vendor Comparison
VendorAI ProductApproach
DynatraceDavis AICausal AI (deterministic root cause, not correlation) + Predictive AI + Generative AI (Davis CoPilot). Most technically differentiated
DatadogBits AIAutonomous agents: AI SRE (incident response), Dev Agent (debugging), Security Analyst. Watchdog AI for anomaly detection. Natural language querying
New RelicAIOpsNLP-powered querying (natural language to NRQL). Automated incident correlation. Alert noise reduction
Grafana LabsGrafana AssistantAI-powered dashboard creation, query generation, alerting. Adopted by thousands of Cloud customers
CoralogixOllyAI agent extending observability across enterprise. Integrated with Aporia acquisition for AI model observability
Dash0Agent0AI-native SRE copilot. Built by Instana founder. 270+ customers in first 9 months. $44.5M raised

Dynatrace’s Causal AI: The Technical Differentiator

Most observability AI uses correlational analysis: “these events happened at the same time, so they might be related.” Dynatrace’s Davis AI uses causal analysis: deterministic root cause identification based on dependency graphs and topology. This produces fewer false positives and more actionable results. It’s the only approach that doesn’t degrade with noisy, high-cardinality data — which is exactly what modern microservice architectures produce.

Dash0: AI-Native from Day One

Founded in 2023 by Mirko Novakovic (who previously founded Instana, acquired by IBM), Dash0 is the most interesting new entrant. With $44.5M raised and 270+ customers in nine months, it’s building observability with AI as the core interaction model from the start, not retrofitting AI onto a traditional dashboard-and-alert product.


12. Log Management: From ELK to the Modern Stack

Why ELK Is Being Replaced

The ELK stack (Elasticsearch, Logstash, Kibana) dominated log management for a decade but struggles at modern scale:

  • Resource consumption: full-text indexing of all log content requires massive compute and storage
  • Complexity: operating production ELK clusters at scale requires dedicated teams
  • Cost: Elasticsearch storage costs are 10–20x higher than columnar alternatives

The Modern Log Architecture

Log Collection/Shipping Tools Compared
ToolLanguageStarsKey Strength
Fluent BitCLightest weight (50–100MB RAM). 15B+ downloads. CNCF Graduated
VectorRust~21.1KHigh performance. 100K+ daily downloads. Datadog-backed
FluentdRuby~30.7KMost mature (14+ years). Richest plugin ecosystem. CNCF Graduated
LogstashJavaRichest plugin ecosystem but heaviest resource footprint

The Modern Pattern

  1. Collection: Fluent Bit or Vector (lightweight, high-performance)
  2. Pipeline: OpenTelemetry Collector (vendor-agnostic routing and transformation)
  3. Storage: Loki (cost-effective, indexes metadata only) or ClickHouse (blazing query speed) or both
  4. Visualization: Grafana

This replaces the monolithic ELK stack with composable, best-of-breed components that can be mixed and matched. The key innovation: Loki indexes only labels (metadata), not full log text, delivering 10x storage cost reduction versus Elasticsearch. For use cases that need fast full-text search over massive log volumes, ClickHouse offers sub-second analytics on terabytes.


13. Pricing Models & Why They Matter

Observability Pricing Models Compared
ModelExamplePricingProsCons
Per-hostDatadog$15–31/host/moPredictable for static infraPunishes containers/microservices
Per-GB ingestedDatadog, New Relic, Elastic$0.10/GB (Datadog logs)Pay for what you useBills scale linearly. Incentivizes dropping data
Per-userNew RelicPer-user + per-GBSimpler budgetingLimits observability access across org
Per-event / per-spanVarious (tracing)VariesGranularEncourages aggressive sampling
Flat-rate / BYOCGroundcoverPredictable regardless of volumeBudget certaintyMay overpay at low utilization
Self-hosted (free)Grafana LGTM, SigNoz, VictoriaMetrics$0 (+ infrastructure cost)Maximum cost controlOperational overhead. Need in-house expertise

The pricing trap: Datadog’s multi-dimensional pricing (per-host + per-GB + per-custom-metric + per-span) means organizations often can’t predict their bill. Enterprise customers negotiate 10–30% discounts at higher volumes, but the fundamental per-GB model means costs scale linearly with growth. At multi-TB/day, annual bills reach millions — which is why self-hosted alternatives and ClickHouse-based solutions gain traction at exactly the customer segment Datadog earns the most from.


14. Business Models in Observability

How Observability Companies Monetize
ModelExampleHow It WorksOutcome
Open Core + CloudGrafana LabsOSS core (AGPLv3). Revenue from Grafana Cloud (managed LGTM) and Enterprise features$400M+ ARR. $6–9B valuation. 7,000+ customers
Pure SaaSDatadog20+ products, consumption-based pricing. No meaningful open source$3.43B revenue. $43.9B market cap. Deep lock-in
OSS + EnterpriseSigNozSelf-hosted free. SigNoz Cloud for managed experience. Enterprise features (SSO, RBAC)$7M raised. 500+ paid customers. 24.2K stars
Self-funded OSSVictoriaMetricsFree community edition. Enterprise version with per-node licensing300%+ growth. 1B+ Docker pulls. $0 external funding
BYOC (Bring Your Own Cloud)GroundcoverData stays in customer’s infrastructure. Flat-rate pricing regardless of volume$60M raised. 500%+ ARR growth
Acquisition targetChronosphere, HyperDX, ObserveBuild differentiated technology, get acquired by platform company$3.35B (Chronosphere), ~$1B (Observe), undisclosed (HyperDX)

The Winning Pattern for New Entrants

  1. Build on OTel-native foundations (don’t create proprietary agents)
  2. Use ClickHouse for storage (90%+ cost reduction vs. Elasticsearch)
  3. Open source the core product for community and trust
  4. Monetize via managed cloud service + enterprise features (SSO, RBAC, compliance)
  5. Differentiate on AI-powered root cause analysis, not just dashboards

15. Opportunities & Gaps

1. The “Datadog for AI Companies”

AI companies generate fundamentally different telemetry: GPU utilization, inference latency distributions, token-level logging, prompt/completion traces, model versioning, hallucination detection, and cost-per-inference metrics. Traditional observability tools don’t model these concepts natively. Coralogix acquired Aporia (AI observability) for this reason. There’s room for a purpose-built platform.

2. Observability FinOps

IBM bought Apptio for $4.6B to manage IT spend. But there’s no tool that specifically manages observability spend — showing which teams, services, and log sources are driving costs, recommending sampling strategies, and enforcing budgets. The $65M–$170M Datadog bills create a massive pain point that no one is directly solving.

3. Cross-Signal Correlation

The traditional “three pillars” (logs, metrics, traces) are still stored and queried as silos in most platforms. ClickStack’s SQL-join approach across signal types is a step forward, but the real opportunity is automatic correlation: when an alert fires on a metric, automatically surface the relevant logs and traces without manual querying.

4. SME Observability

Datadog is enterprise-priced. Self-hosted Grafana requires ops expertise. There’s a gap for opinionated, simple, affordable observability for teams of 5–50 engineers who don’t want to choose between 15 configuration options but need more than uptime monitoring. Better Stack ($3.4M revenue, profitable) is scratching this surface but the market is vastly underserved.

5. Continuous Profiling as the Fourth Pillar

Logs, metrics, and traces tell you what happened. Continuous profiling tells you why it happened at the code level (which function consumed CPU, which allocation caused GC pressure). Grafana acquired Pyroscope. Datadog added continuous profiling. But standalone tooling is immature and the category is wide open.

6. Edge and IoT Observability

Cloud-centric observability tools don’t work well for edge computing, IoT fleets, or embedded systems. Bandwidth constraints prevent shipping all telemetry to a central backend. Local aggregation and intelligent sampling at the edge is a largely unsolved problem.

7. Bootstrapper Opportunities

Specialized status pages + incident management
Better Stack is profitable here. The opportunity is combining status pages with deep observability integration (auto-detect incidents from metrics, auto-update status pages, auto-notify stakeholders)
Synthetic monitoring
Checkly ($32M raised, 1,000+ customers) proves demand. Code-first synthetic monitoring (Playwright-based) that integrates with CI/CD has strong developer appeal
Observability for specific stacks
Laravel, Rails, WordPress, Shopify — vertical observability that understands framework-specific metrics and failure modes. Generic tools miss the context that framework-aware tools can provide
Migration tooling
Helping companies migrate off Datadog. Dashboard conversion, alert migration, OTel re-configuration. Groundcover built an automated migration tool. There’s a consulting/SaaS opportunity here

The Bottom Line

The observability market is massive ($28.5B, growing 19–20% CAGR) and in the middle of a structural transformation. OpenTelemetry is breaking vendor lock-in. ClickHouse is collapsing storage costs. eBPF is eliminating the need for instrumentation. AI is making dashboards obsolete in favor of automated root cause analysis. And $40B+ in M&A is consolidating standalone companies into platform plays. The opportunity for new entrants is narrow but deep: AI-native observability, cost management tooling, vertical specialization, and the SME segment are all underserved. The companies that win will be the ones that treat OTel as a given, ClickHouse as the storage layer, and AI as the interaction model — not dashboards.


← Back to AI Research