~ / startup analyses / Observability & Monitoring Platforms Analysis


Observability & Monitoring Platforms Analysis

Deep-dive analysis of ~15 observability and monitoring platforms — from dominant incumbents (Datadog, Splunk, Dynatrace, New Relic) to open-source challengers (SigNoz, Grafana LGTM stack, Prometheus, Jaeger) to focused newcomers (Honeycomb, Axiom, Better Stack, HyperDX/ClickStack). Each tool is analyzed on architecture, pricing, data model, and positioning.

The core question: Observability bills are the second-biggest infrastructure cost after cloud compute for many companies. Datadog’s complex pricing and vendor lock-in create massive frustration — can an opinionated, simpler, cheaper tool win meaningful share?



2. 1. The Observability Market

Market snapshot
Global observability market (2025)~$4.1 billion
Projected (2030)$9–10 billion (18–20% CAGR)
Application Performance Monitoring (2024)$8.4 billion
Log management (2024)$3.2 billion
Key pain pointDatadog bills growing 30–40% faster than infrastructure — “observability tax”

The market is split across three pillars of observability:

Metrics
Time-series numerical data (CPU, memory, request latency, error rates). Prometheus popularized this with its pull-based model and PromQL query language. Datadog, Grafana Mimir, and InfluxDB are key players.
Logs
Unstructured/semi-structured text records from applications and infrastructure. The original observability signal. Elasticsearch/ELK, Grafana Loki, Datadog Logs, and Splunk dominate. Volume-based pricing makes this the most expensive pillar.
Traces
Distributed request flows across microservices. OpenTelemetry has become the standard instrumentation framework. Jaeger, Zipkin, Grafana Tempo, Datadog APM, and Honeycomb focus here.

The OpenTelemetry shift: OTel (CNCF project, 2nd most active after Kubernetes) is standardizing instrumentation across all three pillars. This is the single biggest structural change in the market — it decouples data collection from the backend, making vendor switching dramatically easier. Any new entrant should be OTel-native from day one.

Market dynamics: Large enterprises run Datadog/Dynatrace/Splunk and spend $500K–$5M+/year. Mid-market companies ($50K–$500K/year) are the most price-sensitive and most likely to switch. Startups and small teams use free tiers or open-source stacks. The “observability tax” backlash is creating real demand for cheaper alternatives.


3. 2. Datadog

Company overview
Founded2010
Revenue (2025)~$3.2 billion ARR
Market cap~$44 billion
Employees~5,500
Customers29,200+ (3,390 with ARR >$100K)
Products20+ products across observability, security, and developer experience
Integrations700+

Pricing Model

Datadog’s pricing is infamously complex. Each product has its own pricing dimension:

  • Infrastructure Monitoring: $15–$23/host/month
  • APM: $31–$40/host/month
  • Log Management: $0.10/GB ingested + $1.70–$2.55/million log events for indexing + $0.05/GB/month for retention
  • RUM: $1.50/1,000 sessions
  • Synthetic Monitoring: $5–$12/1,000 test runs
  • SIEM/Security: $0.20/GB analyzed
  • Continuous Profiler: $12/host/month
  • Database Monitoring: $70/host/month

A mid-size company (100 hosts, moderate logs and APM) easily hits $15K–$30K/month. Common complaints: surprise bills from log spikes, hard-to-predict costs, and “feature creep pricing” where enabling one feature pulls you into needing three more paid products.

Strengths

  • Best-in-class unified platform — metrics, logs, traces, RUM, synthetics, profiling, security all correlated in one UI
  • 700+ integrations with near-instant setup
  • Powerful dashboarding and alerting
  • Strong brand and enterprise trust
  • AI/ML features (Watchdog anomaly detection, LLM Observability)

Weaknesses (Attack Surfaces)

  • Pricing opacity: Multiple pricing dimensions make costs hard to predict
  • Vendor lock-in: Proprietary agents and formats; migrating away is painful
  • Overkill for most: Most teams use 20% of features but pay for the platform tax
  • No self-hosted option: All data goes to Datadog’s cloud
  • Log costs spiral: The #1 complaint — log ingestion costs grow unpredictably

4. 3. Grafana Labs

Company overview
Founded2014
Revenue$400M+ ARR
Valuation$6B → $9B (latest round)
Funding$694M total
Employees~1,200
LicenseAGPL v3 (core projects)
Core stackLGTM — Loki (logs), Grafana (visualization), Tempo (traces), Mimir (metrics)

The LGTM Stack

Grafana
The visualization layer. Industry standard for dashboards. Supports 100+ data sources. Free and open source (AGPL). Used by millions.
Loki
Log aggregation system inspired by Prometheus. Key innovation: indexes only metadata (labels), not full text. This makes it 10x cheaper than Elasticsearch for many workloads. Trade-off: grep-style queries are slower than full-text indexed search.
Tempo
Distributed tracing backend. Object-storage-only architecture (S3/GCS) — very cheap at scale. Supports Jaeger, Zipkin, and OTel protocols natively.
Mimir
Long-term metrics storage. Horizontally scalable Prometheus-compatible backend. Drop-in replacement for Thanos/Cortex.

Pricing (Grafana Cloud)

  • Free tier: 10K metrics, 50GB logs, 50GB traces/month
  • Pro: $29/month base + usage-based (metrics: $8/1K series/month, logs: $0.50/GB, traces: $0.50/GB)
  • Advanced: Custom pricing with SLA, SSO, RBAC
  • Self-hosted: Free (AGPL) — run the entire stack yourself

Strengths

  • Open source with massive community — no vendor lock-in
  • Self-hosted option means you control your data and costs
  • Each component is best-in-class for its function
  • Grafana dashboards are the industry standard
  • Cost-effective at scale — especially Loki and Tempo

Weaknesses

  • Complexity: Running LGTM self-hosted requires significant operational expertise
  • Fragmented experience: Four separate systems vs. Datadog’s unified platform
  • Correlation challenges: Connecting logs → traces → metrics requires careful setup
  • Grafana Cloud costs can grow: At very high scale, Grafana Cloud approaches Datadog pricing

5. 4. SigNoz

Company overview
Founded2021
YC batchW21
Funding$6.5M (Seed)
GitHub stars20K+
LicenseMIT (core) + proprietary enterprise features
BackendClickHouse (columnar database)
Key claim“OpenTelemetry-native, 7–10x cheaper than Datadog”

Architecture

SigNoz is a single unified platform (like Datadog) but built on open standards (like Grafana). It stores all three signals — metrics, logs, and traces — in a single ClickHouse database. This is its key architectural differentiator: one query engine for everything, with native correlation between signals.

Pricing (SigNoz Cloud)

  • Logs: $0.30/GB ingested
  • Traces: $0.30/GB ingested
  • Metrics: $0.10 per 100K samples
  • No per-host fees, no per-seat fees
  • Self-hosted: Free (MIT) — run on your own infrastructure

Comparison with Datadog

DimensionSigNozDatadog
Logs pricing$0.30/GB (all-in)$0.10/GB ingest + $1.70/M events index + retention fees
Host feesNone$15–$23/host/month (infra) + $31–$40/host/month (APM)
Seat feesNoneVaries by product
Self-hostedYes (MIT)No
OTel supportNative (built on OTel)Accepts OTel but prefers proprietary agent
Integrations~50700+
Query languageClickHouse SQLProprietary

Strengths

  • Unified platform with single backend (ClickHouse) — simpler than Grafana’s 4-system stack
  • OTel-native from day one — no proprietary agents
  • Simple, transparent pricing — no hidden dimensions
  • MIT licensed core — no AGPL concerns
  • ClickHouse gives excellent query performance on high-cardinality data

Weaknesses

  • Small team, limited integrations compared to Datadog
  • ClickHouse operations at scale can be complex
  • Less mature alerting and dashboard ecosystem than Grafana
  • Limited enterprise features (SSO, RBAC, audit logs still catching up)

6. 5. New Relic

Company overview
Founded2008
Revenue~$960M ARR (before going private)
AcquisitionTaken private by Francisco Partners & TPG for $6.5B (Nov 2024)
Employees~2,800 (post-layoffs)
BackendNRDB (custom telemetry database)
Key movePivoted to “free tier + usage-based pricing” in 2020

Pricing

  • Free tier: 1 full-access user + 100GB data/month
  • Standard: $0.30/GB beyond free + $0/seat (limited features)
  • Pro: $0.30/GB + $49/full user/month
  • Enterprise: $0.50/GB + custom per-user pricing

New Relic simplified its pricing dramatically in 2020 (from per-host to per-GB + per-user), which was a competitive response to Datadog’s growing dominance. However, per-user fees still add up for larger teams, and the “full platform user” vs. “basic user” distinction creates confusion.

Position

Pioneer of APM, now struggling for relevance. The pivot to usage-based pricing helped but didn’t reverse the trend vs. Datadog. Going private suggests a restructuring phase. Still has a large installed base, especially in Java/.NET shops. NRQL (query language) is powerful but proprietary.


7. 6. Dynatrace

Company overview
Founded2005 (spun out of Compuware)
Revenue~$1.7B ARR
Market cap~$15 billion
Key featureGrail — unified data lakehouse for all observability data
AI engineDavis AI (causal AI, not just correlation)
TargetLarge enterprises (avg deal >$300K ARR)

Position

Dynatrace is the “enterprise Datadog” — more automated, more opinionated, and more expensive. Its Davis AI engine does automatic root cause analysis using causal AI (topology-aware), which is genuinely differentiated from Datadog’s ML-based anomaly detection. OneAgent auto-instrumentation means near-zero configuration.

Trade-offs: extremely enterprise-focused (long sales cycles, complex contracts), less developer-friendly than Datadog, and pricing is opaque (DPS — Davis Data Units — are hard to estimate). Not a realistic target for bootstrappers, but important to understand as the ceiling of the market.


8. 7. Elastic

Company overview
Founded2012
Revenue~$1.3B ARR
Market cap~$9 billion
Core productElasticsearch + Kibana (ELK stack)
LicenseSSPL + Elastic License (not OSI-approved open source)
Observability playElastic Observability (logs, metrics, APM, synthetics)

Position

Elastic built the world’s most popular log search engine and has been expanding into full observability. Elasticsearch is unbeaten for full-text log search performance. However, it’s operationally complex to run, expensive at scale (requires lots of RAM and disk), and the license change from Apache 2.0 to SSPL alienated parts of the community (spawning OpenSearch).

Elastic Cloud pricing starts at $0.046/GB for search-optimized storage. Their observability suite is comprehensive but feels bolted-on compared to purpose-built platforms. Best for organizations already invested in Elasticsearch.


9. 8. Splunk (Cisco)

Company overview
Founded2003
AcquisitionAcquired by Cisco for $28B (March 2024)
Revenue (pre-acquisition)~$3.8B ARR
Core strengthLog analytics, SIEM, IT operations
Query languageSPL (Splunk Processing Language)
PricingPer-GB ingested — historically the most expensive option

Position

Splunk is the legacy king of log analytics. SPL is extremely powerful. But Splunk is expensive ($150–$200+/GB/day at scale), operationally heavy, and increasingly seen as a SIEM/security tool rather than a modern observability platform. The Cisco acquisition signals a shift toward bundling with networking infrastructure rather than competing head-to-head with Datadog.

Splunk Observability Cloud (formerly SignalFx, acquired 2019) provides metrics and APM but hasn’t gained significant traction against Datadog. Most interesting as a cautionary tale: being expensive and complex creates an opening for simpler alternatives.


10. 9. Prometheus & Jaeger (Open Source Foundations)

Prometheus

Created2012 at SoundCloud
StatusCNCF graduated project
What it doesPull-based metrics collection and storage
Query languagePromQL (industry standard)
LimitationSingle-node by design — needs Thanos, Cortex, or Mimir for horizontal scaling and long-term storage

Jaeger

Created2015 at Uber
StatusCNCF graduated project
What it doesDistributed tracing backend
StorageCassandra, Elasticsearch, or in-memory
NoteBeing superseded by Grafana Tempo and OTel Collector for many use cases

Prometheus and Jaeger are the foundational open-source projects in their respective domains. Every commercial observability platform is either built on them, compatible with them, or competing against them. They’re essential to understand but not direct competitors in the commercial sense — they’re the building blocks other products are assembled from.


11. 10. Honeycomb

Company overview
Founded2016
Funding$97M total
FoundersCharity Majors, Christine Yen (ex-Facebook infrastructure)
Key concept“Observability” as high-cardinality event analysis (vs. traditional monitoring)
BackendCustom columnar store optimized for high-cardinality queries
Pricing$0.20/GB events (20M events/month free)

Position

Honeycomb popularized the modern definition of “observability” (as distinct from monitoring). Their approach: send wide, structured events with many dimensions, then slice and dice interactively to find unknown unknowns. BubbleUp (automatic anomaly analysis) and Query Builder are genuinely innovative.

Limitations: primarily traces/events focused (logs and metrics are secondary), smaller ecosystem, and the “observability philosophy” requires teams to change how they think about debugging. Strong in developer-led, cloud-native companies. Unlikely to displace Datadog in enterprises that want a full platform.


12. 11. Axiom

Company overview
Founded2020
Funding$27M (Series A)
Key claim“Log everything, query everything, pay less”
BackendCustom storage engine on object storage (S3)
Pricing$0.35/GB ingested (free tier: 500GB/month)
IntegrationOfficial log drain for Vercel

Position

Axiom’s bet: storage is cheap, so ingest everything and query it on-demand. No indexing upfront — which means no decisions about what to keep and what to drop. Their Vercel partnership gives them a strong foothold in the Next.js/serverless ecosystem.

Primarily a log analytics platform that’s expanding into traces and metrics. APL (Axiom Processing Language) is based on KQL (Kusto Query Language from Azure). Good for teams that want to log everything without worrying about costs, but less mature as a full observability platform.


13. 12. Better Stack

Company overview
Founded2021
ProductsBetter Uptime (uptime monitoring), Better Stack Logs (log management), Better Stack Telemetry
Pricing (logs)From $0.25/GB with 3-day retention (free tier: 1GB/month)
ApproachCombines uptime monitoring, incident management, and log management in one platform

Position

Better Stack started as an uptime monitoring tool (competing with Pingdom, UptimeRobot) and expanded into log management. Their strength is the unified incident workflow: uptime check fails → alert fires → on-call gets paged → logs are right there for debugging. Clean UI, developer-friendly.

Limitation: still primarily uptime + logs. Not a full observability platform (no APM, limited traces, basic metrics). Good for small-to-mid teams that want simple, affordable monitoring without the complexity of Datadog.


14. 13. HyperDX / ClickStack

Company overview
Founded2023
Renamed toClickStack (early 2025)
GitHub stars7K+
LicenseMIT
BackendClickHouse
Key claim“Open-source Datadog alternative”

Position

Very similar to SigNoz in approach: unified platform, ClickHouse backend, OTel-native, MIT licensed. The rebrand from HyperDX to ClickStack leans into the ClickHouse connection. Differentiators include session replay integration and a focus on developer experience.

Earlier stage than SigNoz with a smaller community. Interesting as a validation of the “unified ClickHouse observability” thesis. Both SigNoz and ClickStack demonstrate that ClickHouse + OTel is the emerging open-source stack for full observability.


15. 14. Uptrace

Company overview
TypeOpen-source observability platform
BackendClickHouse
LicenseBSL (Business Source License)
FocusOTel-native tracing and metrics
Cloud pricingStarting at $1/month per 100K spans

Position

Another ClickHouse-backed, OTel-native platform. Smaller community than SigNoz. BSL license is less permissive than MIT. Primarily interesting as further validation of the ClickHouse + OTel stack, but less likely to win against SigNoz which has more momentum, better funding (YC), and MIT licensing.


16. 15. Checkly

Company overview
Founded2018
FocusSynthetic monitoring & monitoring-as-code
Funding$12.6M
Key featurePlaywright-based browser checks defined in code
PricingFree tier (50 API checks), Starter at $30/month

Position

Checkly focuses on a specific slice of observability: synthetic monitoring (are your APIs and web apps working from the user’s perspective?). “Monitoring as code” approach fits into CI/CD workflows. Competes more with Pingdom and Datadog Synthetics than with full observability platforms. Important as an example of how focused tools can carve out a niche.


17. 16. Sentry

Company overview
Founded2015 (project started 2008)
Revenue$100M+ ARR
Funding$217M total
FocusError tracking & performance monitoring
LicenseBSL (was Apache 2.0)
PricingFree tier (5K errors/month), Team at $26/month

Position

Sentry occupies the “application-level” monitoring niche: error tracking, crash reporting, and performance monitoring (transaction tracing). Not a full infrastructure observability platform, but deeply integrated into developer workflows. Session replay, profiling, and release health tracking make it complementary to (not a replacement for) Datadog/Grafana.

Most teams use Sentry alongside their observability platform, not instead of it. Developer experience is excellent — SDKs for every language with automatic error grouping and stack trace deobfuscation.


18. 17. Competitive Comparison Table

PlatformTypeBackendOTel NativeSelf-HostedLicenseRevenue/Funding
DatadogFull platformProprietaryAccepts OTelNoProprietary$3.2B ARR
Grafana LabsFull stack (LGTM)Multiple (Loki, Mimir, Tempo)YesYesAGPL$400M+ ARR
SigNozFull platformClickHouseYes (built on OTel)YesMIT$6.5M raised
New RelicFull platformNRDBAccepts OTelNoProprietary~$960M ARR
DynatraceFull platformGrailAccepts OTelNoProprietary$1.7B ARR
ElasticSearch + ObservabilityElasticsearchAccepts OTelPartialSSPL/ELv2$1.3B ARR
SplunkLog analytics + SIEMProprietaryAccepts OTelYes (on-prem)Proprietary$28B acquisition
HoneycombEvent analyticsCustom columnarYesNoProprietary$97M raised
AxiomLog analyticsObject storage (S3)Accepts OTelNoProprietary$27M raised
Better StackUptime + LogsCustomAccepts OTelNoProprietaryVenture-backed
ClickStackFull platformClickHouseYesYesMITEarly stage
UptraceTracing + MetricsClickHouseYesYesBSLBootstrapped
ChecklySynthetic monitoringCustomAccepts OTelNoProprietary$12.6M raised
SentryError trackingCustom (Snuba/ClickHouse)PartialYesBSL$100M+ ARR

19. 18. Pricing Comparison

Estimated monthly cost for a typical mid-size setup: 50 hosts, 500GB logs/month, 200GB traces/month, 50K metrics series.

PlatformHost/Infra FeesLog CostTrace CostMetric CostEstimated Total
Datadog$1,150 (infra) + $2,000 (APM)$900+ (ingest + index)Included in APMCustom metrics extra$4,000–$6,000+
Grafana CloudNone$250 (500GB × $0.50)$100 (200GB × $0.50)$400 (50K series)$750–$1,000
SigNoz CloudNone$150 (500GB × $0.30)$60 (200GB × $0.30)$50$260–$400
New RelicNone$210 (700GB × $0.30) + per-user fees$500–$1,500
AxiomNone$175 (500GB × $0.35)LimitedLimited$200–$500
Self-hosted (Grafana/SigNoz)Infrastructure cost only$0 software cost + cloud compute/storage$200–$800 (cloud infra)

Key insight: SigNoz is 10–15x cheaper than Datadog for equivalent functionality. Grafana Cloud is 4–6x cheaper. Self-hosted is cheapest but adds operational burden. The pricing gap is real and growing — this is the primary attack vector for challengers.


20. 19. How to Compete as a Bootstrapper

The Landscape Reality

Building a general-purpose “Datadog competitor” as a bootstrapper is extremely difficult — SigNoz has raised $6.5M, Grafana Labs $694M, and incumbents have thousands of engineers. However, there are specific wedges that remain viable:

Strategy 1: Vertical Observability

Build observability for a specific stack or use case that general platforms serve poorly:

  • WordPress/PHP monitoring: Most observability tools are built for cloud-native. The massive WordPress ecosystem has no purpose-built observability tool.
  • IoT/Edge observability: Devices with intermittent connectivity, constrained resources. None of the major platforms handle this well.
  • Database-specific monitoring: Deep PostgreSQL or MySQL monitoring with query analysis, index recommendations, and performance tuning. PgAnalyze shows this works ($5M+ ARR, bootstrapped feel).
  • Serverless-native: Lambda/Cloudflare Workers monitoring. Lumigo and Epsagon (acquired) proved the market. Current tools still struggle with cold starts, cost correlation, and function-level debugging.

Strategy 2: Opinionated “Good Enough” Platform

The Better Stack approach: combine uptime monitoring + logs + alerting + incident management into one clean, affordable tool. Don’t try to compete with Datadog on features — compete on simplicity and price.

  • Target: teams of 5–50 developers who don’t need 700 integrations
  • Price: flat $49–$199/month (predictable, no usage surprises)
  • Differentiator: 5-minute setup, beautiful UI, zero configuration decisions
  • Tech stack: ClickHouse + OTel collector + simple web UI

Strategy 3: Cost Optimization Layer

Don’t replace Datadog — reduce the bill. Build a tool that:

  • Analyzes Datadog/Grafana/New Relic usage and identifies waste
  • Recommends log sampling strategies, metric cardinality reduction
  • Provides a “Datadog bill estimator” before you get the surprise invoice
  • Revenue model: charge 10–20% of savings identified

This is the observability equivalent of cloud cost management (Vantage, CloudHealth). Real pain point, clear ROI, and you’re selling to the exact budget holder who’s frustrated about observability costs.

Strategy 4: OTel Pipeline Management

OpenTelemetry Collector pipelines are becoming complex. Teams need to route, filter, sample, and transform telemetry data before it hits their backend (to control costs). Build a visual OTel pipeline builder:

  • Drag-and-drop OTel Collector configuration
  • Built-in sampling strategies (tail-based, head-based, adaptive)
  • Cost estimation per pipeline configuration
  • Multi-backend routing (send logs to cheap storage, traces to SigNoz, metrics to Prometheus)

This is infrastructure plumbing that every OTel-adopting team needs but nobody wants to build in-house. BindPlane (acquired by observIQ/Google), Cribl, and Mezmo are in this space but are enterprise-focused.

Strategy 5: Self-Hosted SaaS (Managed Open Source)

Offer managed SigNoz/Grafana stack deployed in the customer’s own cloud account:

  • Data never leaves their VPC (compliance teams love this)
  • They pay cloud costs directly (transparent, no markup confusion)
  • You charge a management fee ($500–$5,000/month)
  • Handle upgrades, scaling, backups, ClickHouse tuning

This is the Aiven/Elestio model applied to observability. Works especially well for healthcare, finance, and government sectors with strict data residency requirements.

The DHH/37signals Filter

Applying the “build for yourself” test: if you run infrastructure and are frustrated by your Datadog bill, build the simpler alternative you actually want. The best wedge for a solo founder or small team:

  1. Pick Strategy 2 or 3 — either build the “good enough” all-in-one or the cost optimizer
  2. Target a specific audience — Rails developers, Laravel developers, or Next.js/Vercel users
  3. Price simply — flat monthly fee, no per-GB or per-host pricing
  4. Ship fast — ClickHouse + OTel Collector + clean UI is a viable MVP in 2–3 months
  5. Content marketing — “I cut my Datadog bill by 90%” posts drive incredible organic traffic

Bottom line: The observability market is massive and the incumbents are genuinely disliked for their pricing. OpenTelemetry and ClickHouse have commoditized the hard infrastructure. The opportunity is in packaging, pricing, and targeting — not in building better technology. A focused, opinionated tool with simple pricing can build a profitable $1–10M business by capturing even 0.01% of Datadog’s frustrated customers.