~ / AI Research / Observability Market Dynamics

Observability Market Dynamics: The $28B Consolidation Wave, OpenTelemetry, the Cost Revolt & What Comes Next

The observability market is worth $28.5 billion (2025), projected to reach $164–172 billion by 2035 at 19–20% CAGR. In the past three years, over $40 billion in M&A has reshaped the landscape: Cisco bought Splunk for $28B, New Relic went private for $6.5B, Palo Alto Networks acquired Chronosphere for $3.35B, and Snowflake bought Observe for ~$1B. Meanwhile, OpenTelemetry is decoupling instrumentation from vendors, eBPF is enabling zero-code observability, and companies spending $65M–$170M annually on Datadog are revolting.

This analysis covers the full market dynamics: every major player with revenue and funding, the consolidation thesis, OpenTelemetry’s market-reshaping impact, the observability tax problem and migration economics, the eBPF revolution, AI/AIOps integration, the Grafana open source playbook, and where the real opportunities are.



Market Overview & Key Numbers

Observability Market at a Glance
Broader market (2025)$28.5 billion
Projected size (2035)$164–172 billion
CAGR19–20%
Core observability platform market$2.9–4.8B (narrower scope)
M&A in past 3 years$40B+ in deal value
Datadog revenue (FY2025)$3.43B (+28% YoY)
Grafana Labs ARR$400M+ (60%+ growth)
OpenTelemetry adoption~50% of CNCF end-user companies

Three forces are colliding: (1) consolidation — platform companies in security, networking, and data are acquiring observability startups, (2) commoditization — OpenTelemetry is making backends interchangeable, and (3) cost pressure — companies at scale are revolting against per-GB pricing that produces million-dollar annual bills. The result is a market that is simultaneously massive and under siege.


Revenue Landscape: Who’s Winning

Observability Companies by Revenue (2025)
Company Revenue / ARR Growth Market Cap / Valuation Status
Splunk (Cisco) ~$4B (pre-acquisition) Acquired for $28B Integrated into Cisco
Datadog $3.43B +28% YoY ~$43.9B market cap Public (DDOG)
Dynatrace $1.70B +18.75% YoY ~$10–11B market cap Public (DT)
Elastic $1.48B +17% YoY Public (ESTC) IDC MarketScape Leader 2025
New Relic ~$1B ARR (est.) Acquired for $6.5B Private (Francisco Partners + TPG)
Grafana Labs $400M+ ARR +60% YoY $6B (raising at $9B) Private. 7,000+ customers
Bugcrowd $328.2M +40% YoY $1B Private
Chronosphere $160M+ ARR Triple-digit YoY Acquired for $3.35B Palo Alto Networks (Nov 2025)
Pentera ~$100M ARR $1B+ Private
Honeycomb Undisclosed (2x in 2023) 160%+ NRR Private ($150M raised) 600+ customers
Logz.io $48M Private ($123M raised) 800 customers, 200 employees
Coralogix Undisclosed $1B+ ($350M raised) Acquired Aporia for AI observability
Axiom Undisclosed (2x in 2025) Private ($59M raised) 40,000+ organizations
Better Stack $3.4M Private ($28.9M raised) Profitable since 2024
Last9 ~$2.7M Private ($13M raised) 21 employees

The Power Law

Datadog alone captures more revenue ($3.43B) than all open source and indie alternatives combined. With 31,400 customers and 4,310 generating over $100K ARR (+19% YoY), it dominates the enterprise. 31% of customers use 6+ products; 16% use 8+ products — deep platform lock-in. Yet its pricing is the single biggest driver of the open source revolt.


The $40B+ Consolidation Wave

Major Observability M&A (2023–2026)
Acquirer Target Value Year Thesis
Cisco Splunk $28B 2024 Networking + observability + security. Largest deal in Cisco history
Francisco Partners + TPG New Relic $6.5B 2023 Take-private. Public market undervalued consumption model
IBM Apptio $4.6B 2023 FinOps + observability cost management. $450B IT spend data
Palo Alto Networks Chronosphere $3.35B 2025 Security + observability convergence for AI era
Snowflake Observe ~$1B 2026 Data platform + observability convergence
ClickHouse HyperDX Undisclosed 2025 Open source full-stack observability on ClickHouse
Elastic Jina AI + others Undisclosed 2025 13 total acquisitions. AI + search + observability

The Convergence Thesis

Three types of companies are acquiring observability startups:

  1. Security companies (Palo Alto Networks → Chronosphere). Security and observability share the same data: logs, network flows, endpoint telemetry. Converging them eliminates duplicate data pipelines and enables unified threat detection + performance monitoring.
  2. Networking/infrastructure companies (Cisco → Splunk). Network telemetry is observability data. Cisco now has Splunk + AppDynamics + ThousandEyes for full-stack visibility from network to application.
  3. Data platform companies (Snowflake → Observe, ClickHouse → HyperDX). Observability is a data problem. If you already store and query petabytes, adding observability is a product expansion, not a new capability.

The implication for startups: standalone observability companies are becoming acquisition targets rather than IPO candidates. Chronosphere ($3.35B), Observe (~$1B), and HyperDX were all acquired before reaching maturity as independent companies. The window for building a standalone observability business is narrowing.


OpenTelemetry: The Market Reshaper

OpenTelemetry Key Metrics
CNCF statusGraduated. Second-highest velocity CNCF project (after Kubernetes)
Contributors10,000+ individuals from 1,200+ companies
Monthly active contributors1,200+ developers (+18% YoY), 200+ companies (+22% YoY)
Code commits+45% YoY increase
Google search volume+100% increase (2024)
Adoption~50% of CNCF end-user companies; 85% investing in OTel
Largest contributorsSplunk (#1), Google, Red Hat, Microsoft, Amazon, Cisco, Uber

Why OTel Changes Everything

OpenTelemetry decouples instrumentation (how you collect telemetry) from backends (where you send it). This is the most important structural change in the observability market since the term was coined:

Vendor portability
Instrument once with OTel, send to any backend. Switch from Datadog to Grafana Cloud without re-instrumenting your applications. This shatters the traditional lock-in model that vendors relied on for retention.
Multi-vendor strategies
Route critical traces to Honeycomb for deep analysis, bulk logs to Loki for cost efficiency, and metrics to VictoriaMetrics for performance. The OTel Collector’s pipeline architecture (receivers → processors → exporters) makes this trivial.
Commoditization of collection
If every vendor supports the same instrumentation standard, the differentiation shifts from “how easy is it to get data in?” to “how good is your query engine, AI, and UX?”

SDK Status

SignalStatus
Tracing APIStable across all major languages (Java, Python, Go, JS, .NET, Ruby, PHP, Rust, C++, Swift, Erlang, Kotlin)
Metrics APIStable across all major languages
Logs specificationStable; individual language implementations vary
Spring Boot starterStable

The Implication

OTel is the kingmaker. With near-universal vendor support and 50% adoption, it makes observability backends interchangeable. This favors open source and cost-effective alternatives, because the switching cost drops from “re-instrument everything” to “change the exporter config.” Vendors that relied on proprietary agents for lock-in (Datadog’s dd-agent, New Relic’s agent) are losing their moat.


The Observability Tax Revolt

The Scale of the Problem

Known High-Spend Datadog Customers
OpenAI$170 million/year on Datadog (reported)
Coinbase$65 million on Datadog (reported)

Observability costs often grow faster than revenue or headcount. The dynamic: more microservices → more hosts → more logs/metrics/traces → exponentially higher bills. SaaS platforms charge per-GB ingested, per-host monitored, or per-custom-metric — creating a perverse incentive to reduce observability at exactly the moment you need more visibility.

Why Bills Explode

Migration Economics

Conventional migration
~6 months, up to $200K in consulting fees. Re-instrumenting applications, rebuilding dashboards, migrating alerts
Modern migration (with OTel)
Days to weeks. Change the OTel Collector exporter config. Dashboards and alerts still need migration, but the hardest part (instrumentation) is portable
Typical savings from Datadog alternatives
60–80% cost reduction with managed alternatives; 90%+ savings with self-hosted ClickHouse-based solutions at multi-TB/day scale

Open Source Observability Stack

Open Source Observability Projects
Project GitHub Stars CNCF Status Function Backing
Grafana ~66K Visualization & dashboards Grafana Labs (AGPLv3)
Prometheus ~56K Graduated (2018) Metrics collection & alerting CNCF / Independent
ClickHouse ~41K Columnar analytics database ClickHouse Inc.
Jaeger ~30.3K Graduated (2019) Distributed tracing Originally Uber. v2 uses OTel core
Fluentd ~30.7K Graduated (2019) Log collection Treasure Data
SigNoz ~24.2K Full-stack observability (OTel-native) $7M raised (YC-backed)
Grafana Loki ~24K Log aggregation Grafana Labs (AGPLv3)
Vector ~21.1K Telemetry pipeline (collection/routing) Datadog. 100K+ downloads/day
OpenObserve ~15K Full-stack observability (Rust) 3,600+ deployments. Claims 140x lower storage
VictoriaMetrics ~14K Metrics (Prometheus-compatible) Self-funded. 300%+ growth in 2024. 1B+ Docker downloads
HyperDX ~9.2K Full-stack observability Acquired by ClickHouse (Mar 2025)
Grafana Mimir ~4.4K Long-term metrics storage Grafana Labs (AGPLv3)
Grafana Tempo ~4K Distributed tracing Grafana Labs (AGPLv3)

VictoriaMetrics: The Self-Funded Dark Horse

VictoriaMetrics has achieved 300%+ growth in 2024, 1 billion+ Docker downloads, and Spotify as a customer — all with zero external funding. It’s Prometheus-compatible (drop-in replacement) with dramatically better performance and storage efficiency. This is the most impressive bootstrapping story in the observability space, proving that a focused, high-performance open source tool can compete against well-funded startups.


Case Study: The Grafana Playbook

Grafana Labs Key Metrics
ARR$400M+ (Sep 2025), up from $250M (Aug 2024) — 60%+ growth
Customers7,000+ (up from 5,000 in Aug 2024), including 70% of Fortune 50
Total funding$805M+
Valuation$6B (2024); reportedly raising at $9B (Feb 2026)
Employees~1,500 (+18% YoY)
Key investorsOntario Teachers’, Sapphire, Tiger Global, Lightspeed, GIC, Coatue, Sequoia, CapitalG
RecognitionForbes Cloud 100 (#13). Leader in 2025 Gartner Magic Quadrant
Notable AI customersAnthropic, Lovable, OutSystems

The LGTM Stack

ComponentPurposeStarsLicense
LokiLog aggregation~24KAGPLv3
GrafanaVisualization~66KAGPLv3
TempoDistributed tracing~4KAGPLv3
MimirLong-term metrics~4.4KAGPLv3
AlloyTelemetry collectorApache 2.0
BeylaeBPF auto-instrumentationApache 2.0 (donated to OTel)
PyroscopeContinuous profilingAGPLv3

The AGPLv3 Strategy

In 2021, Grafana Labs moved Grafana, Loki, and Tempo from Apache 2.0 to AGPLv3 to prevent cloud providers (AWS, Azure, GCP) from offering managed services without contributing back. Key nuances:

Why It Works

  1. Grafana is the de facto visualization layer. Even Datadog and New Relic users often pipe data into Grafana for dashboards. This makes Grafana the “control plane” of observability
  2. Each component is best-of-breed. Loki is the cost leader for logs (indexes metadata only, not full text). Mimir scales Prometheus to petabytes. Tempo is the simplest tracing backend
  3. Self-hosted is free, Cloud is paid. Companies self-host to save money, then migrate to Grafana Cloud as they grow and operational overhead becomes a burden. Natural upsell path
  4. The AI wave benefits them. Anthropic, Lovable, and OutSystems are Grafana Cloud customers. AI companies generate massive telemetry (GPU metrics, inference traces, token-level logging) and need cost-effective observability

The ClickHouse Convergence

ClickHouse (41K GitHub stars) is emerging as the default storage engine for cost-conscious observability platforms. Its columnar architecture delivers millisecond query times on terabytes of log data at a fraction of Elasticsearch’s storage cost.

Who’s Building on ClickHouse

Why ClickHouse Wins for Observability

Compression
Columnar storage compresses log data 10–20x better than row-oriented databases. TB/day becomes manageable
Query speed
Millisecond aggregation queries on billions of rows. The analytical workload of observability (group-by, percentiles, top-N) is ClickHouse’s sweet spot
SQL
Standard SQL (with extensions) instead of proprietary query languages. Lower learning curve. ClickStack enables SQL joins between logs and traces
Cost
90%+ savings vs. Elasticsearch at multi-TB/day scale. This is the single biggest driver of adoption

The signal: ClickHouse Inc.’s acquisition of HyperDX and release of ClickStack shows the database company is going all-in on observability as a first-class use case. This creates a gravitational pull for any new observability startup — why build your own storage when ClickHouse is purpose-built for your workload?


eBPF: Zero-Instrumentation Observability

eBPF (extended Berkeley Packet Filter) allows programs to run in the Linux kernel without modifying kernel source code. For observability, this means capturing requests, metrics, and traces without instrumenting applications — no SDK, no code changes, no restarts.

Why eBPF Matters

Key Players

Company Funding Model Key Facts
Groundcover $60M ($35M Series B, Apr 2025) BYOC (data stays in your cloud) 500%+ ARR growth. Fortune 100 customers. Automated Datadog migration tool
Odigos $13M (Sep 2024) OSS + Commercial Zero-code distributed tracing. OTel eBPF auto-instrumentation for Go. <5 min install
Pixie Acquired by New Relic, donated to CNCF CNCF Sandbox Runs inside K8s clusters. No data leaves the cluster
Coroot Community OSS (Apache 2.0) eBPF-based APM + AI root cause analysis. Metrics, logs, traces, profiling, SLOs
Cilium / Hubble CNCF Graduated Network-level K8s observability. Service maps, flow visibility, security

OpenTelemetry eBPF Instrumentation (OBI)

In 2025, the first alpha release of OBI was announced — a collaborative project based on Grafana Beyla, donated by Grafana Labs with contributions from Splunk, Coralogix, Odigos, and others. This represents the standardization of eBPF-based observability under the OpenTelemetry umbrella. When OBI matures, it will further commoditize data collection and make vendor differentiation entirely about the backend.


AI & AIOps in Observability

Every major vendor now offers AI-powered observability features. The differentiation lies in whether AI is truly embedded in the core product or bolted on as an assistant layer.

AI in Observability: Vendor Comparison
Vendor AI Product Approach
Dynatrace Davis AI Causal AI (deterministic root cause, not correlation) + Predictive AI + Generative AI (Davis CoPilot). Most technically differentiated
Datadog Bits AI Autonomous agents: AI SRE (incident response), Dev Agent (debugging), Security Analyst. Watchdog AI for anomaly detection. Natural language querying
New Relic AIOps NLP-powered querying (natural language to NRQL). Automated incident correlation. Alert noise reduction
Grafana Labs Grafana Assistant AI-powered dashboard creation, query generation, alerting. Adopted by thousands of Cloud customers
Coralogix Olly AI agent extending observability across enterprise. Integrated with Aporia acquisition for AI model observability
Dash0 Agent0 AI-native SRE copilot. Built by Instana founder. 270+ customers in first 9 months. $44.5M raised

Dynatrace’s Causal AI: The Technical Differentiator

Most observability AI uses correlational analysis: “these events happened at the same time, so they might be related.” Dynatrace’s Davis AI uses causal analysis: deterministic root cause identification based on dependency graphs and topology. This produces fewer false positives and more actionable results. It’s the only approach that doesn’t degrade with noisy, high-cardinality data — which is exactly what modern microservice architectures produce.

Dash0: AI-Native from Day One

Founded in 2023 by Mirko Novakovic (who previously founded Instana, acquired by IBM), Dash0 is the most interesting new entrant. With $44.5M raised and 270+ customers in nine months, it’s building observability with AI as the core interaction model from the start, not retrofitting AI onto a traditional dashboard-and-alert product.


Log Management: From ELK to the Modern Stack

Why ELK Is Being Replaced

The ELK stack (Elasticsearch, Logstash, Kibana) dominated log management for a decade but struggles at modern scale:

The Modern Log Architecture

Log Collection/Shipping Tools Compared
Tool Language Stars Key Strength
Fluent Bit C Lightest weight (50–100MB RAM). 15B+ downloads. CNCF Graduated
Vector Rust ~21.1K High performance. 100K+ daily downloads. Datadog-backed
Fluentd Ruby ~30.7K Most mature (14+ years). Richest plugin ecosystem. CNCF Graduated
Logstash Java Richest plugin ecosystem but heaviest resource footprint

The Modern Pattern

  1. Collection: Fluent Bit or Vector (lightweight, high-performance)
  2. Pipeline: OpenTelemetry Collector (vendor-agnostic routing and transformation)
  3. Storage: Loki (cost-effective, indexes metadata only) or ClickHouse (blazing query speed) or both
  4. Visualization: Grafana

This replaces the monolithic ELK stack with composable, best-of-breed components that can be mixed and matched. The key innovation: Loki indexes only labels (metadata), not full log text, delivering 10x storage cost reduction versus Elasticsearch. For use cases that need fast full-text search over massive log volumes, ClickHouse offers sub-second analytics on terabytes.


Pricing Models & Why They Matter

Observability Pricing Models Compared
Model Example Pricing Pros Cons
Per-host Datadog $15–31/host/mo Predictable for static infra Punishes containers/microservices
Per-GB ingested Datadog, New Relic, Elastic $0.10/GB (Datadog logs) Pay for what you use Bills scale linearly. Incentivizes dropping data
Per-user New Relic Per-user + per-GB Simpler budgeting Limits observability access across org
Per-event / per-span Various (tracing) Varies Granular Encourages aggressive sampling
Flat-rate / BYOC Groundcover Predictable regardless of volume Budget certainty May overpay at low utilization
Self-hosted (free) Grafana LGTM, SigNoz, VictoriaMetrics $0 (+ infrastructure cost) Maximum cost control Operational overhead. Need in-house expertise

The pricing trap: Datadog’s multi-dimensional pricing (per-host + per-GB + per-custom-metric + per-span) means organizations often can’t predict their bill. Enterprise customers negotiate 10–30% discounts at higher volumes, but the fundamental per-GB model means costs scale linearly with growth. At multi-TB/day, annual bills reach millions — which is why self-hosted alternatives and ClickHouse-based solutions gain traction at exactly the customer segment Datadog earns the most from.


Business Models in Observability

How Observability Companies Monetize
Model Example How It Works Outcome
Open Core + Cloud Grafana Labs OSS core (AGPLv3). Revenue from Grafana Cloud (managed LGTM) and Enterprise features $400M+ ARR. $6–9B valuation. 7,000+ customers
Pure SaaS Datadog 20+ products, consumption-based pricing. No meaningful open source $3.43B revenue. $43.9B market cap. Deep lock-in
OSS + Enterprise SigNoz Self-hosted free. SigNoz Cloud for managed experience. Enterprise features (SSO, RBAC) $7M raised. 500+ paid customers. 24.2K stars
Self-funded OSS VictoriaMetrics Free community edition. Enterprise version with per-node licensing 300%+ growth. 1B+ Docker pulls. $0 external funding
BYOC (Bring Your Own Cloud) Groundcover Data stays in customer’s infrastructure. Flat-rate pricing regardless of volume $60M raised. 500%+ ARR growth
Acquisition target Chronosphere, HyperDX, Observe Build differentiated technology, get acquired by platform company $3.35B (Chronosphere), ~$1B (Observe), undisclosed (HyperDX)

The Winning Pattern for New Entrants

  1. Build on OTel-native foundations (don’t create proprietary agents)
  2. Use ClickHouse for storage (90%+ cost reduction vs. Elasticsearch)
  3. Open source the core product for community and trust
  4. Monetize via managed cloud service + enterprise features (SSO, RBAC, compliance)
  5. Differentiate on AI-powered root cause analysis, not just dashboards

Opportunities & Gaps

1. The “Datadog for AI Companies”

AI companies generate fundamentally different telemetry: GPU utilization, inference latency distributions, token-level logging, prompt/completion traces, model versioning, hallucination detection, and cost-per-inference metrics. Traditional observability tools don’t model these concepts natively. Coralogix acquired Aporia (AI observability) for this reason. There’s room for a purpose-built platform.

2. Observability FinOps

IBM bought Apptio for $4.6B to manage IT spend. But there’s no tool that specifically manages observability spend — showing which teams, services, and log sources are driving costs, recommending sampling strategies, and enforcing budgets. The $65M–$170M Datadog bills create a massive pain point that no one is directly solving.

3. Cross-Signal Correlation

The traditional “three pillars” (logs, metrics, traces) are still stored and queried as silos in most platforms. ClickStack’s SQL-join approach across signal types is a step forward, but the real opportunity is automatic correlation: when an alert fires on a metric, automatically surface the relevant logs and traces without manual querying.

4. SME Observability

Datadog is enterprise-priced. Self-hosted Grafana requires ops expertise. There’s a gap for opinionated, simple, affordable observability for teams of 5–50 engineers who don’t want to choose between 15 configuration options but need more than uptime monitoring. Better Stack ($3.4M revenue, profitable) is scratching this surface but the market is vastly underserved.

5. Continuous Profiling as the Fourth Pillar

Logs, metrics, and traces tell you what happened. Continuous profiling tells you why it happened at the code level (which function consumed CPU, which allocation caused GC pressure). Grafana acquired Pyroscope. Datadog added continuous profiling. But standalone tooling is immature and the category is wide open.

6. Edge and IoT Observability

Cloud-centric observability tools don’t work well for edge computing, IoT fleets, or embedded systems. Bandwidth constraints prevent shipping all telemetry to a central backend. Local aggregation and intelligent sampling at the edge is a largely unsolved problem.

7. Bootstrapper Opportunities

Specialized status pages + incident management
Better Stack is profitable here. The opportunity is combining status pages with deep observability integration (auto-detect incidents from metrics, auto-update status pages, auto-notify stakeholders)
Synthetic monitoring
Checkly ($32M raised, 1,000+ customers) proves demand. Code-first synthetic monitoring (Playwright-based) that integrates with CI/CD has strong developer appeal
Observability for specific stacks
Laravel, Rails, WordPress, Shopify — vertical observability that understands framework-specific metrics and failure modes. Generic tools miss the context that framework-aware tools can provide
Migration tooling
Helping companies migrate off Datadog. Dashboard conversion, alert migration, OTel re-configuration. Groundcover built an automated migration tool. There’s a consulting/SaaS opportunity here

The Bottom Line

The observability market is massive ($28.5B, growing 19–20% CAGR) and in the middle of a structural transformation. OpenTelemetry is breaking vendor lock-in. ClickHouse is collapsing storage costs. eBPF is eliminating the need for instrumentation. AI is making dashboards obsolete in favor of automated root cause analysis. And $40B+ in M&A is consolidating standalone companies into platform plays. The opportunity for new entrants is narrow but deep: AI-native observability, cost management tooling, vertical specialization, and the SME segment are all underserved. The companies that win will be the ones that treat OTel as a given, ClickHouse as the storage layer, and AI as the interaction model — not dashboards.


← Back to AI Research