Observability & Monitoring Platforms Analysis

Deep-dive analysis of ~15 observability and monitoring platforms — from dominant incumbents (Datadog, Splunk, Dynatrace, New Relic) to open-source challengers (SigNoz, Grafana LGTM stack, Prometheus, Jaeger) to focused newcomers (Honeycomb, Axiom, Better Stack, HyperDX/ClickStack). Each tool is analyzed on architecture, pricing, data model, and positioning.

The core question: Observability bills are the second-biggest infrastructure cost after cloud compute for many companies. Datadog’s complex pricing and vendor lock-in create massive frustration — can an opinionated, simpler, cheaper tool win meaningful share?

2. 1. The Observability Market

Market snapshot
Global observability market (2025)	~$4.1 billion
Projected (2030)	$9–10 billion (18–20% CAGR)
Application Performance Monitoring (2024)	$8.4 billion
Log management (2024)	$3.2 billion
Key pain point	Datadog bills growing 30–40% faster than infrastructure — “observability tax”

The market is split across three pillars of observability:

Metrics: Time-series numerical data (CPU, memory, request latency, error rates). Prometheus popularized this with its pull-based model and PromQL query language. Datadog, Grafana Mimir, and InfluxDB are key players.
Logs: Unstructured/semi-structured text records from applications and infrastructure. The original observability signal. Elasticsearch/ELK, Grafana Loki, Datadog Logs, and Splunk dominate. Volume-based pricing makes this the most expensive pillar.
Traces: Distributed request flows across microservices. OpenTelemetry has become the standard instrumentation framework. Jaeger, Zipkin, Grafana Tempo, Datadog APM, and Honeycomb focus here.

The OpenTelemetry shift: OTel (CNCF project, 2nd most active after Kubernetes) is standardizing instrumentation across all three pillars. This is the single biggest structural change in the market — it decouples data collection from the backend, making vendor switching dramatically easier. Any new entrant should be OTel-native from day one.

Market dynamics: Large enterprises run Datadog/Dynatrace/Splunk and spend $500K–$5M+/year. Mid-market companies ($50K–$500K/year) are the most price-sensitive and most likely to switch. Startups and small teams use free tiers or open-source stacks. The “observability tax” backlash is creating real demand for cheaper alternatives.

3. 2. Datadog

Company overview
Founded	2010
Revenue (2025)	~$3.2 billion ARR
Market cap	~$44 billion
Employees	~5,500
Customers	29,200+ (3,390 with ARR >$100K)
Products	20+ products across observability, security, and developer experience
Integrations	700+

Pricing Model

Datadog’s pricing is infamously complex. Each product has its own pricing dimension:

Infrastructure Monitoring: $15–$23/host/month
APM: $31–$40/host/month
Log Management: $0.10/GB ingested + $1.70–$2.55/million log events for indexing + $0.05/GB/month for retention
RUM: $1.50/1,000 sessions
Synthetic Monitoring: $5–$12/1,000 test runs
SIEM/Security: $0.20/GB analyzed
Continuous Profiler: $12/host/month
Database Monitoring: $70/host/month

A mid-size company (100 hosts, moderate logs and APM) easily hits $15K–$30K/month. Common complaints: surprise bills from log spikes, hard-to-predict costs, and “feature creep pricing” where enabling one feature pulls you into needing three more paid products.

Strengths

Best-in-class unified platform — metrics, logs, traces, RUM, synthetics, profiling, security all correlated in one UI
700+ integrations with near-instant setup
Powerful dashboarding and alerting
Strong brand and enterprise trust
AI/ML features (Watchdog anomaly detection, LLM Observability)

Weaknesses (Attack Surfaces)

Pricing opacity: Multiple pricing dimensions make costs hard to predict
Vendor lock-in: Proprietary agents and formats; migrating away is painful
Overkill for most: Most teams use 20% of features but pay for the platform tax
No self-hosted option: All data goes to Datadog’s cloud
Log costs spiral: The #1 complaint — log ingestion costs grow unpredictably

4. 3. Grafana Labs

Company overview
Founded	2014
Revenue	$400M+ ARR
Valuation	$6B → $9B (latest round)
Funding	$694M total
Employees	~1,200
License	AGPL v3 (core projects)
Core stack	LGTM — Loki (logs), Grafana (visualization), Tempo (traces), Mimir (metrics)

The LGTM Stack

Grafana: The visualization layer. Industry standard for dashboards. Supports 100+ data sources. Free and open source (AGPL). Used by millions.
Loki: Log aggregation system inspired by Prometheus. Key innovation: indexes only metadata (labels), not full text. This makes it 10x cheaper than Elasticsearch for many workloads. Trade-off: grep-style queries are slower than full-text indexed search.
Tempo: Distributed tracing backend. Object-storage-only architecture (S3/GCS) — very cheap at scale. Supports Jaeger, Zipkin, and OTel protocols natively.
Mimir: Long-term metrics storage. Horizontally scalable Prometheus-compatible backend. Drop-in replacement for Thanos/Cortex.

Pricing (Grafana Cloud)

Free tier: 10K metrics, 50GB logs, 50GB traces/month
Pro: $29/month base + usage-based (metrics: $8/1K series/month, logs: $0.50/GB, traces: $0.50/GB)
Advanced: Custom pricing with SLA, SSO, RBAC
Self-hosted: Free (AGPL) — run the entire stack yourself

Strengths

Open source with massive community — no vendor lock-in
Self-hosted option means you control your data and costs
Each component is best-in-class for its function
Grafana dashboards are the industry standard
Cost-effective at scale — especially Loki and Tempo

Weaknesses

Complexity: Running LGTM self-hosted requires significant operational expertise
Fragmented experience: Four separate systems vs. Datadog’s unified platform
Correlation challenges: Connecting logs → traces → metrics requires careful setup
Grafana Cloud costs can grow: At very high scale, Grafana Cloud approaches Datadog pricing

5. 4. SigNoz

Company overview
Founded	2021
YC batch	W21
Funding	$6.5M (Seed)
GitHub stars	20K+
License	MIT (core) + proprietary enterprise features
Backend	ClickHouse (columnar database)
Key claim	“OpenTelemetry-native, 7–10x cheaper than Datadog”

Architecture

SigNoz is a single unified platform (like Datadog) but built on open standards (like Grafana). It stores all three signals — metrics, logs, and traces — in a single ClickHouse database. This is its key architectural differentiator: one query engine for everything, with native correlation between signals.

Pricing (SigNoz Cloud)

Logs: $0.30/GB ingested
Traces: $0.30/GB ingested
Metrics: $0.10 per 100K samples
No per-host fees, no per-seat fees
Self-hosted: Free (MIT) — run on your own infrastructure

Comparison with Datadog

Dimension	SigNoz	Datadog
Logs pricing	$0.30/GB (all-in)	$0.10/GB ingest + $1.70/M events index + retention fees
Host fees	None	$15–$23/host/month (infra) + $31–$40/host/month (APM)
Seat fees	None	Varies by product
Self-hosted	Yes (MIT)	No
OTel support	Native (built on OTel)	Accepts OTel but prefers proprietary agent
Integrations	~50	700+
Query language	ClickHouse SQL	Proprietary

Strengths

Unified platform with single backend (ClickHouse) — simpler than Grafana’s 4-system stack
OTel-native from day one — no proprietary agents
Simple, transparent pricing — no hidden dimensions
MIT licensed core — no AGPL concerns
ClickHouse gives excellent query performance on high-cardinality data

Weaknesses

Small team, limited integrations compared to Datadog
ClickHouse operations at scale can be complex
Less mature alerting and dashboard ecosystem than Grafana
Limited enterprise features (SSO, RBAC, audit logs still catching up)

6. 5. New Relic

Company overview
Founded	2008
Revenue	~$960M ARR (before going private)
Acquisition	Taken private by Francisco Partners & TPG for $6.5B (Nov 2024)
Employees	~2,800 (post-layoffs)
Backend	NRDB (custom telemetry database)
Key move	Pivoted to “free tier + usage-based pricing” in 2020

Pricing

Free tier: 1 full-access user + 100GB data/month
Standard: $0.30/GB beyond free + $0/seat (limited features)
Pro: $0.30/GB + $49/full user/month
Enterprise: $0.50/GB + custom per-user pricing

New Relic simplified its pricing dramatically in 2020 (from per-host to per-GB + per-user), which was a competitive response to Datadog’s growing dominance. However, per-user fees still add up for larger teams, and the “full platform user” vs. “basic user” distinction creates confusion.

Position

Pioneer of APM, now struggling for relevance. The pivot to usage-based pricing helped but didn’t reverse the trend vs. Datadog. Going private suggests a restructuring phase. Still has a large installed base, especially in Java/.NET shops. NRQL (query language) is powerful but proprietary.

7. 6. Dynatrace

Company overview
Founded	2005 (spun out of Compuware)
Revenue	~$1.7B ARR
Market cap	~$15 billion
Key feature	Grail — unified data lakehouse for all observability data
AI engine	Davis AI (causal AI, not just correlation)
Target	Large enterprises (avg deal >$300K ARR)

Position

Dynatrace is the “enterprise Datadog” — more automated, more opinionated, and more expensive. Its Davis AI engine does automatic root cause analysis using causal AI (topology-aware), which is genuinely differentiated from Datadog’s ML-based anomaly detection. OneAgent auto-instrumentation means near-zero configuration.

Trade-offs: extremely enterprise-focused (long sales cycles, complex contracts), less developer-friendly than Datadog, and pricing is opaque (DPS — Davis Data Units — are hard to estimate). Not a realistic target for bootstrappers, but important to understand as the ceiling of the market.

8. 7. Elastic

Company overview
Founded	2012
Revenue	~$1.3B ARR
Market cap	~$9 billion
Core product	Elasticsearch + Kibana (ELK stack)
License	SSPL + Elastic License (not OSI-approved open source)
Observability play	Elastic Observability (logs, metrics, APM, synthetics)

Position

Elastic built the world’s most popular log search engine and has been expanding into full observability. Elasticsearch is unbeaten for full-text log search performance. However, it’s operationally complex to run, expensive at scale (requires lots of RAM and disk), and the license change from Apache 2.0 to SSPL alienated parts of the community (spawning OpenSearch).

Elastic Cloud pricing starts at $0.046/GB for search-optimized storage. Their observability suite is comprehensive but feels bolted-on compared to purpose-built platforms. Best for organizations already invested in Elasticsearch.

9. 8. Splunk (Cisco)

Company overview
Founded	2003
Acquisition	Acquired by Cisco for $28B (March 2024)
Revenue (pre-acquisition)	~$3.8B ARR
Core strength	Log analytics, SIEM, IT operations
Query language	SPL (Splunk Processing Language)
Pricing	Per-GB ingested — historically the most expensive option

Position

Splunk is the legacy king of log analytics. SPL is extremely powerful. But Splunk is expensive ($150–$200+/GB/day at scale), operationally heavy, and increasingly seen as a SIEM/security tool rather than a modern observability platform. The Cisco acquisition signals a shift toward bundling with networking infrastructure rather than competing head-to-head with Datadog.

Splunk Observability Cloud (formerly SignalFx, acquired 2019) provides metrics and APM but hasn’t gained significant traction against Datadog. Most interesting as a cautionary tale: being expensive and complex creates an opening for simpler alternatives.

10. 9. Prometheus & Jaeger (Open Source Foundations)

Prometheus

Created	2012 at SoundCloud
Status	CNCF graduated project
What it does	Pull-based metrics collection and storage
Query language	PromQL (industry standard)
Limitation	Single-node by design — needs Thanos, Cortex, or Mimir for horizontal scaling and long-term storage

Jaeger

Created	2015 at Uber
Status	CNCF graduated project
What it does	Distributed tracing backend
Storage	Cassandra, Elasticsearch, or in-memory
Note	Being superseded by Grafana Tempo and OTel Collector for many use cases

Prometheus and Jaeger are the foundational open-source projects in their respective domains. Every commercial observability platform is either built on them, compatible with them, or competing against them. They’re essential to understand but not direct competitors in the commercial sense — they’re the building blocks other products are assembled from.

11. 10. Honeycomb

Company overview
Founded	2016
Funding	$97M total
Founders	Charity Majors, Christine Yen (ex-Facebook infrastructure)
Key concept	“Observability” as high-cardinality event analysis (vs. traditional monitoring)
Backend	Custom columnar store optimized for high-cardinality queries
Pricing	$0.20/GB events (20M events/month free)

Position

Honeycomb popularized the modern definition of “observability” (as distinct from monitoring). Their approach: send wide, structured events with many dimensions, then slice and dice interactively to find unknown unknowns. BubbleUp (automatic anomaly analysis) and Query Builder are genuinely innovative.

Limitations: primarily traces/events focused (logs and metrics are secondary), smaller ecosystem, and the “observability philosophy” requires teams to change how they think about debugging. Strong in developer-led, cloud-native companies. Unlikely to displace Datadog in enterprises that want a full platform.

12. 11. Axiom

Company overview
Founded	2020
Funding	$27M (Series A)
Key claim	“Log everything, query everything, pay less”
Backend	Custom storage engine on object storage (S3)
Pricing	$0.35/GB ingested (free tier: 500GB/month)
Integration	Official log drain for Vercel

Position

Axiom’s bet: storage is cheap, so ingest everything and query it on-demand. No indexing upfront — which means no decisions about what to keep and what to drop. Their Vercel partnership gives them a strong foothold in the Next.js/serverless ecosystem.

Primarily a log analytics platform that’s expanding into traces and metrics. APL (Axiom Processing Language) is based on KQL (Kusto Query Language from Azure). Good for teams that want to log everything without worrying about costs, but less mature as a full observability platform.

13. 12. Better Stack

Company overview
Founded	2021
Products	Better Uptime (uptime monitoring), Better Stack Logs (log management), Better Stack Telemetry
Pricing (logs)	From $0.25/GB with 3-day retention (free tier: 1GB/month)
Approach	Combines uptime monitoring, incident management, and log management in one platform

Position

Better Stack started as an uptime monitoring tool (competing with Pingdom, UptimeRobot) and expanded into log management. Their strength is the unified incident workflow: uptime check fails → alert fires → on-call gets paged → logs are right there for debugging. Clean UI, developer-friendly.

Limitation: still primarily uptime + logs. Not a full observability platform (no APM, limited traces, basic metrics). Good for small-to-mid teams that want simple, affordable monitoring without the complexity of Datadog.

14. 13. HyperDX / ClickStack

Company overview
Founded	2023
Renamed to	ClickStack (early 2025)
GitHub stars	7K+
License	MIT
Backend	ClickHouse
Key claim	“Open-source Datadog alternative”

Position

Very similar to SigNoz in approach: unified platform, ClickHouse backend, OTel-native, MIT licensed. The rebrand from HyperDX to ClickStack leans into the ClickHouse connection. Differentiators include session replay integration and a focus on developer experience.

Earlier stage than SigNoz with a smaller community. Interesting as a validation of the “unified ClickHouse observability” thesis. Both SigNoz and ClickStack demonstrate that ClickHouse + OTel is the emerging open-source stack for full observability.

15. 14. Uptrace

Company overview
Type	Open-source observability platform
Backend	ClickHouse
License	BSL (Business Source License)
Focus	OTel-native tracing and metrics
Cloud pricing	Starting at $1/month per 100K spans

Position

Another ClickHouse-backed, OTel-native platform. Smaller community than SigNoz. BSL license is less permissive than MIT. Primarily interesting as further validation of the ClickHouse + OTel stack, but less likely to win against SigNoz which has more momentum, better funding (YC), and MIT licensing.

16. 15. Checkly

Company overview
Founded	2018
Focus	Synthetic monitoring & monitoring-as-code
Funding	$12.6M
Key feature	Playwright-based browser checks defined in code
Pricing	Free tier (50 API checks), Starter at $30/month

Position

Checkly focuses on a specific slice of observability: synthetic monitoring (are your APIs and web apps working from the user’s perspective?). “Monitoring as code” approach fits into CI/CD workflows. Competes more with Pingdom and Datadog Synthetics than with full observability platforms. Important as an example of how focused tools can carve out a niche.

17. 16. Sentry

Company overview
Founded	2015 (project started 2008)
Revenue	$100M+ ARR
Funding	$217M total
Focus	Error tracking & performance monitoring
License	BSL (was Apache 2.0)
Pricing	Free tier (5K errors/month), Team at $26/month

Position

Sentry occupies the “application-level” monitoring niche: error tracking, crash reporting, and performance monitoring (transaction tracing). Not a full infrastructure observability platform, but deeply integrated into developer workflows. Session replay, profiling, and release health tracking make it complementary to (not a replacement for) Datadog/Grafana.

Most teams use Sentry alongside their observability platform, not instead of it. Developer experience is excellent — SDKs for every language with automatic error grouping and stack trace deobfuscation.

18. 17. Competitive Comparison Table

Platform	Type	Backend	OTel Native	Self-Hosted	License	Revenue/Funding
Datadog	Full platform	Proprietary	Accepts OTel	No	Proprietary	$3.2B ARR
Grafana Labs	Full stack (LGTM)	Multiple (Loki, Mimir, Tempo)	Yes	Yes	AGPL	$400M+ ARR
SigNoz	Full platform	ClickHouse	Yes (built on OTel)	Yes	MIT	$6.5M raised
New Relic	Full platform	NRDB	Accepts OTel	No	Proprietary	~$960M ARR
Dynatrace	Full platform	Grail	Accepts OTel	No	Proprietary	$1.7B ARR
Elastic	Search + Observability	Elasticsearch	Accepts OTel	Partial	SSPL/ELv2	$1.3B ARR
Splunk	Log analytics + SIEM	Proprietary	Accepts OTel	Yes (on-prem)	Proprietary	$28B acquisition
Honeycomb	Event analytics	Custom columnar	Yes	No	Proprietary	$97M raised
Axiom	Log analytics	Object storage (S3)	Accepts OTel	No	Proprietary	$27M raised
Better Stack	Uptime + Logs	Custom	Accepts OTel	No	Proprietary	Venture-backed
ClickStack	Full platform	ClickHouse	Yes	Yes	MIT	Early stage
Uptrace	Tracing + Metrics	ClickHouse	Yes	Yes	BSL	Bootstrapped
Checkly	Synthetic monitoring	Custom	Accepts OTel	No	Proprietary	$12.6M raised
Sentry	Error tracking	Custom (Snuba/ClickHouse)	Partial	Yes	BSL	$100M+ ARR

19. 18. Pricing Comparison

Estimated monthly cost for a typical mid-size setup: 50 hosts, 500GB logs/month, 200GB traces/month, 50K metrics series.

Platform	Host/Infra Fees	Log Cost	Trace Cost	Metric Cost	Estimated Total
Datadog	$1,150 (infra) + $2,000 (APM)	$900+ (ingest + index)	Included in APM	Custom metrics extra	$4,000–$6,000+
Grafana Cloud	None	$250 (500GB × $0.50)	$100 (200GB × $0.50)	$400 (50K series)	$750–$1,000
SigNoz Cloud	None	$150 (500GB × $0.30)	$60 (200GB × $0.30)	$50	$260–$400
New Relic	None	$210 (700GB × $0.30) + per-user fees			$500–$1,500
Axiom	None	$175 (500GB × $0.35)	Limited	Limited	$200–$500
Self-hosted (Grafana/SigNoz)	Infrastructure cost only	$0 software cost + cloud compute/storage			$200–$800 (cloud infra)

Key insight: SigNoz is 10–15x cheaper than Datadog for equivalent functionality. Grafana Cloud is 4–6x cheaper. Self-hosted is cheapest but adds operational burden. The pricing gap is real and growing — this is the primary attack vector for challengers.

20. 19. How to Compete as a Bootstrapper

The Landscape Reality

Building a general-purpose “Datadog competitor” as a bootstrapper is extremely difficult — SigNoz has raised $6.5M, Grafana Labs $694M, and incumbents have thousands of engineers. However, there are specific wedges that remain viable:

Strategy 1: Vertical Observability

Build observability for a specific stack or use case that general platforms serve poorly:

WordPress/PHP monitoring: Most observability tools are built for cloud-native. The massive WordPress ecosystem has no purpose-built observability tool.
IoT/Edge observability: Devices with intermittent connectivity, constrained resources. None of the major platforms handle this well.
Database-specific monitoring: Deep PostgreSQL or MySQL monitoring with query analysis, index recommendations, and performance tuning. PgAnalyze shows this works ($5M+ ARR, bootstrapped feel).
Serverless-native: Lambda/Cloudflare Workers monitoring. Lumigo and Epsagon (acquired) proved the market. Current tools still struggle with cold starts, cost correlation, and function-level debugging.

Strategy 2: Opinionated “Good Enough” Platform

The Better Stack approach: combine uptime monitoring + logs + alerting + incident management into one clean, affordable tool. Don’t try to compete with Datadog on features — compete on simplicity and price.

Target: teams of 5–50 developers who don’t need 700 integrations
Price: flat $49–$199/month (predictable, no usage surprises)
Differentiator: 5-minute setup, beautiful UI, zero configuration decisions
Tech stack: ClickHouse + OTel collector + simple web UI

Strategy 3: Cost Optimization Layer

Don’t replace Datadog — reduce the bill. Build a tool that:

Analyzes Datadog/Grafana/New Relic usage and identifies waste
Recommends log sampling strategies, metric cardinality reduction
Provides a “Datadog bill estimator” before you get the surprise invoice
Revenue model: charge 10–20% of savings identified

This is the observability equivalent of cloud cost management (Vantage, CloudHealth). Real pain point, clear ROI, and you’re selling to the exact budget holder who’s frustrated about observability costs.

Strategy 4: OTel Pipeline Management

OpenTelemetry Collector pipelines are becoming complex. Teams need to route, filter, sample, and transform telemetry data before it hits their backend (to control costs). Build a visual OTel pipeline builder:

Drag-and-drop OTel Collector configuration
Built-in sampling strategies (tail-based, head-based, adaptive)
Cost estimation per pipeline configuration
Multi-backend routing (send logs to cheap storage, traces to SigNoz, metrics to Prometheus)

This is infrastructure plumbing that every OTel-adopting team needs but nobody wants to build in-house. BindPlane (acquired by observIQ/Google), Cribl, and Mezmo are in this space but are enterprise-focused.

Strategy 5: Self-Hosted SaaS (Managed Open Source)

Offer managed SigNoz/Grafana stack deployed in the customer’s own cloud account:

Data never leaves their VPC (compliance teams love this)
They pay cloud costs directly (transparent, no markup confusion)
You charge a management fee ($500–$5,000/month)
Handle upgrades, scaling, backups, ClickHouse tuning

This is the Aiven/Elestio model applied to observability. Works especially well for healthcare, finance, and government sectors with strict data residency requirements.

The DHH/37signals Filter

Applying the “build for yourself” test: if you run infrastructure and are frustrated by your Datadog bill, build the simpler alternative you actually want. The best wedge for a solo founder or small team:

Pick Strategy 2 or 3 — either build the “good enough” all-in-one or the cost optimizer
Target a specific audience — Rails developers, Laravel developers, or Next.js/Vercel users
Price simply — flat monthly fee, no per-GB or per-host pricing
Ship fast — ClickHouse + OTel Collector + clean UI is a viable MVP in 2–3 months
Content marketing — “I cut my Datadog bill by 90%” posts drive incredible organic traffic

Bottom line: The observability market is massive and the incumbents are genuinely disliked for their pricing. OpenTelemetry and ClickHouse have commoditized the hard infrastructure. The opportunity is in packaging, pricing, and targeting — not in building better technology. A focused, opinionated tool with simple pricing can build a profitable $1–10M business by capturing even 0.01% of Datadog’s frustrated customers.