2. 1. The Observability Market
| Global observability market (2025) | ~$4.1 billion |
|---|---|
| Projected (2030) | $9–10 billion (18–20% CAGR) |
| Application Performance Monitoring (2024) | $8.4 billion |
| Log management (2024) | $3.2 billion |
| Key pain point | Datadog bills growing 30–40% faster than infrastructure — “observability tax” |
The market is split across three pillars of observability:
- Metrics
- Time-series numerical data (CPU, memory, request latency, error rates). Prometheus popularized this with its pull-based model and PromQL query language. Datadog, Grafana Mimir, and InfluxDB are key players.
- Logs
- Unstructured/semi-structured text records from applications and infrastructure. The original observability signal. Elasticsearch/ELK, Grafana Loki, Datadog Logs, and Splunk dominate. Volume-based pricing makes this the most expensive pillar.
- Traces
- Distributed request flows across microservices. OpenTelemetry has become the standard instrumentation framework. Jaeger, Zipkin, Grafana Tempo, Datadog APM, and Honeycomb focus here.
The OpenTelemetry shift: OTel (CNCF project, 2nd most active after Kubernetes) is standardizing instrumentation across all three pillars. This is the single biggest structural change in the market — it decouples data collection from the backend, making vendor switching dramatically easier. Any new entrant should be OTel-native from day one.
Market dynamics: Large enterprises run Datadog/Dynatrace/Splunk and spend $500K–$5M+/year. Mid-market companies ($50K–$500K/year) are the most price-sensitive and most likely to switch. Startups and small teams use free tiers or open-source stacks. The “observability tax” backlash is creating real demand for cheaper alternatives.
3. 2. Datadog
| Founded | 2010 |
|---|---|
| Revenue (2025) | ~$3.2 billion ARR |
| Market cap | ~$44 billion |
| Employees | ~5,500 |
| Customers | 29,200+ (3,390 with ARR >$100K) |
| Products | 20+ products across observability, security, and developer experience |
| Integrations | 700+ |
Pricing Model
Datadog’s pricing is infamously complex. Each product has its own pricing dimension:
- Infrastructure Monitoring: $15–$23/host/month
- APM: $31–$40/host/month
- Log Management: $0.10/GB ingested + $1.70–$2.55/million log events for indexing + $0.05/GB/month for retention
- RUM: $1.50/1,000 sessions
- Synthetic Monitoring: $5–$12/1,000 test runs
- SIEM/Security: $0.20/GB analyzed
- Continuous Profiler: $12/host/month
- Database Monitoring: $70/host/month
A mid-size company (100 hosts, moderate logs and APM) easily hits $15K–$30K/month. Common complaints: surprise bills from log spikes, hard-to-predict costs, and “feature creep pricing” where enabling one feature pulls you into needing three more paid products.
Strengths
- Best-in-class unified platform — metrics, logs, traces, RUM, synthetics, profiling, security all correlated in one UI
- 700+ integrations with near-instant setup
- Powerful dashboarding and alerting
- Strong brand and enterprise trust
- AI/ML features (Watchdog anomaly detection, LLM Observability)
Weaknesses (Attack Surfaces)
- Pricing opacity: Multiple pricing dimensions make costs hard to predict
- Vendor lock-in: Proprietary agents and formats; migrating away is painful
- Overkill for most: Most teams use 20% of features but pay for the platform tax
- No self-hosted option: All data goes to Datadog’s cloud
- Log costs spiral: The #1 complaint — log ingestion costs grow unpredictably
4. 3. Grafana Labs
| Founded | 2014 |
|---|---|
| Revenue | $400M+ ARR |
| Valuation | $6B → $9B (latest round) |
| Funding | $694M total |
| Employees | ~1,200 |
| License | AGPL v3 (core projects) |
| Core stack | LGTM — Loki (logs), Grafana (visualization), Tempo (traces), Mimir (metrics) |
The LGTM Stack
- Grafana
- The visualization layer. Industry standard for dashboards. Supports 100+ data sources. Free and open source (AGPL). Used by millions.
- Loki
- Log aggregation system inspired by Prometheus. Key innovation: indexes only metadata (labels), not full text. This makes it 10x cheaper than Elasticsearch for many workloads. Trade-off: grep-style queries are slower than full-text indexed search.
- Tempo
- Distributed tracing backend. Object-storage-only architecture (S3/GCS) — very cheap at scale. Supports Jaeger, Zipkin, and OTel protocols natively.
- Mimir
- Long-term metrics storage. Horizontally scalable Prometheus-compatible backend. Drop-in replacement for Thanos/Cortex.
Pricing (Grafana Cloud)
- Free tier: 10K metrics, 50GB logs, 50GB traces/month
- Pro: $29/month base + usage-based (metrics: $8/1K series/month, logs: $0.50/GB, traces: $0.50/GB)
- Advanced: Custom pricing with SLA, SSO, RBAC
- Self-hosted: Free (AGPL) — run the entire stack yourself
Strengths
- Open source with massive community — no vendor lock-in
- Self-hosted option means you control your data and costs
- Each component is best-in-class for its function
- Grafana dashboards are the industry standard
- Cost-effective at scale — especially Loki and Tempo
Weaknesses
- Complexity: Running LGTM self-hosted requires significant operational expertise
- Fragmented experience: Four separate systems vs. Datadog’s unified platform
- Correlation challenges: Connecting logs → traces → metrics requires careful setup
- Grafana Cloud costs can grow: At very high scale, Grafana Cloud approaches Datadog pricing
5. 4. SigNoz
| Founded | 2021 |
|---|---|
| YC batch | W21 |
| Funding | $6.5M (Seed) |
| GitHub stars | 20K+ |
| License | MIT (core) + proprietary enterprise features |
| Backend | ClickHouse (columnar database) |
| Key claim | “OpenTelemetry-native, 7–10x cheaper than Datadog” |
Architecture
SigNoz is a single unified platform (like Datadog) but built on open standards (like Grafana). It stores all three signals — metrics, logs, and traces — in a single ClickHouse database. This is its key architectural differentiator: one query engine for everything, with native correlation between signals.
Pricing (SigNoz Cloud)
- Logs: $0.30/GB ingested
- Traces: $0.30/GB ingested
- Metrics: $0.10 per 100K samples
- No per-host fees, no per-seat fees
- Self-hosted: Free (MIT) — run on your own infrastructure
Comparison with Datadog
| Dimension | SigNoz | Datadog |
|---|---|---|
| Logs pricing | $0.30/GB (all-in) | $0.10/GB ingest + $1.70/M events index + retention fees |
| Host fees | None | $15–$23/host/month (infra) + $31–$40/host/month (APM) |
| Seat fees | None | Varies by product |
| Self-hosted | Yes (MIT) | No |
| OTel support | Native (built on OTel) | Accepts OTel but prefers proprietary agent |
| Integrations | ~50 | 700+ |
| Query language | ClickHouse SQL | Proprietary |
Strengths
- Unified platform with single backend (ClickHouse) — simpler than Grafana’s 4-system stack
- OTel-native from day one — no proprietary agents
- Simple, transparent pricing — no hidden dimensions
- MIT licensed core — no AGPL concerns
- ClickHouse gives excellent query performance on high-cardinality data
Weaknesses
- Small team, limited integrations compared to Datadog
- ClickHouse operations at scale can be complex
- Less mature alerting and dashboard ecosystem than Grafana
- Limited enterprise features (SSO, RBAC, audit logs still catching up)
6. 5. New Relic
| Founded | 2008 |
|---|---|
| Revenue | ~$960M ARR (before going private) |
| Acquisition | Taken private by Francisco Partners & TPG for $6.5B (Nov 2024) |
| Employees | ~2,800 (post-layoffs) |
| Backend | NRDB (custom telemetry database) |
| Key move | Pivoted to “free tier + usage-based pricing” in 2020 |
Pricing
- Free tier: 1 full-access user + 100GB data/month
- Standard: $0.30/GB beyond free + $0/seat (limited features)
- Pro: $0.30/GB + $49/full user/month
- Enterprise: $0.50/GB + custom per-user pricing
New Relic simplified its pricing dramatically in 2020 (from per-host to per-GB + per-user), which was a competitive response to Datadog’s growing dominance. However, per-user fees still add up for larger teams, and the “full platform user” vs. “basic user” distinction creates confusion.
Position
Pioneer of APM, now struggling for relevance. The pivot to usage-based pricing helped but didn’t reverse the trend vs. Datadog. Going private suggests a restructuring phase. Still has a large installed base, especially in Java/.NET shops. NRQL (query language) is powerful but proprietary.
7. 6. Dynatrace
| Founded | 2005 (spun out of Compuware) |
|---|---|
| Revenue | ~$1.7B ARR |
| Market cap | ~$15 billion |
| Key feature | Grail — unified data lakehouse for all observability data |
| AI engine | Davis AI (causal AI, not just correlation) |
| Target | Large enterprises (avg deal >$300K ARR) |
Position
Dynatrace is the “enterprise Datadog” — more automated, more opinionated, and more expensive. Its Davis AI engine does automatic root cause analysis using causal AI (topology-aware), which is genuinely differentiated from Datadog’s ML-based anomaly detection. OneAgent auto-instrumentation means near-zero configuration.
Trade-offs: extremely enterprise-focused (long sales cycles, complex contracts), less developer-friendly than Datadog, and pricing is opaque (DPS — Davis Data Units — are hard to estimate). Not a realistic target for bootstrappers, but important to understand as the ceiling of the market.
8. 7. Elastic
| Founded | 2012 |
|---|---|
| Revenue | ~$1.3B ARR |
| Market cap | ~$9 billion |
| Core product | Elasticsearch + Kibana (ELK stack) |
| License | SSPL + Elastic License (not OSI-approved open source) |
| Observability play | Elastic Observability (logs, metrics, APM, synthetics) |
Position
Elastic built the world’s most popular log search engine and has been expanding into full observability. Elasticsearch is unbeaten for full-text log search performance. However, it’s operationally complex to run, expensive at scale (requires lots of RAM and disk), and the license change from Apache 2.0 to SSPL alienated parts of the community (spawning OpenSearch).
Elastic Cloud pricing starts at $0.046/GB for search-optimized storage. Their observability suite is comprehensive but feels bolted-on compared to purpose-built platforms. Best for organizations already invested in Elasticsearch.
9. 8. Splunk (Cisco)
| Founded | 2003 |
|---|---|
| Acquisition | Acquired by Cisco for $28B (March 2024) |
| Revenue (pre-acquisition) | ~$3.8B ARR |
| Core strength | Log analytics, SIEM, IT operations |
| Query language | SPL (Splunk Processing Language) |
| Pricing | Per-GB ingested — historically the most expensive option |
Position
Splunk is the legacy king of log analytics. SPL is extremely powerful. But Splunk is expensive ($150–$200+/GB/day at scale), operationally heavy, and increasingly seen as a SIEM/security tool rather than a modern observability platform. The Cisco acquisition signals a shift toward bundling with networking infrastructure rather than competing head-to-head with Datadog.
Splunk Observability Cloud (formerly SignalFx, acquired 2019) provides metrics and APM but hasn’t gained significant traction against Datadog. Most interesting as a cautionary tale: being expensive and complex creates an opening for simpler alternatives.
10. 9. Prometheus & Jaeger (Open Source Foundations)
Prometheus
| Created | 2012 at SoundCloud |
|---|---|
| Status | CNCF graduated project |
| What it does | Pull-based metrics collection and storage |
| Query language | PromQL (industry standard) |
| Limitation | Single-node by design — needs Thanos, Cortex, or Mimir for horizontal scaling and long-term storage |
Jaeger
| Created | 2015 at Uber |
|---|---|
| Status | CNCF graduated project |
| What it does | Distributed tracing backend |
| Storage | Cassandra, Elasticsearch, or in-memory |
| Note | Being superseded by Grafana Tempo and OTel Collector for many use cases |
Prometheus and Jaeger are the foundational open-source projects in their respective domains. Every commercial observability platform is either built on them, compatible with them, or competing against them. They’re essential to understand but not direct competitors in the commercial sense — they’re the building blocks other products are assembled from.
11. 10. Honeycomb
| Founded | 2016 |
|---|---|
| Funding | $97M total |
| Founders | Charity Majors, Christine Yen (ex-Facebook infrastructure) |
| Key concept | “Observability” as high-cardinality event analysis (vs. traditional monitoring) |
| Backend | Custom columnar store optimized for high-cardinality queries |
| Pricing | $0.20/GB events (20M events/month free) |
Position
Honeycomb popularized the modern definition of “observability” (as distinct from monitoring). Their approach: send wide, structured events with many dimensions, then slice and dice interactively to find unknown unknowns. BubbleUp (automatic anomaly analysis) and Query Builder are genuinely innovative.
Limitations: primarily traces/events focused (logs and metrics are secondary), smaller ecosystem, and the “observability philosophy” requires teams to change how they think about debugging. Strong in developer-led, cloud-native companies. Unlikely to displace Datadog in enterprises that want a full platform.
12. 11. Axiom
| Founded | 2020 |
|---|---|
| Funding | $27M (Series A) |
| Key claim | “Log everything, query everything, pay less” |
| Backend | Custom storage engine on object storage (S3) |
| Pricing | $0.35/GB ingested (free tier: 500GB/month) |
| Integration | Official log drain for Vercel |
Position
Axiom’s bet: storage is cheap, so ingest everything and query it on-demand. No indexing upfront — which means no decisions about what to keep and what to drop. Their Vercel partnership gives them a strong foothold in the Next.js/serverless ecosystem.
Primarily a log analytics platform that’s expanding into traces and metrics. APL (Axiom Processing Language) is based on KQL (Kusto Query Language from Azure). Good for teams that want to log everything without worrying about costs, but less mature as a full observability platform.
13. 12. Better Stack
| Founded | 2021 |
|---|---|
| Products | Better Uptime (uptime monitoring), Better Stack Logs (log management), Better Stack Telemetry |
| Pricing (logs) | From $0.25/GB with 3-day retention (free tier: 1GB/month) |
| Approach | Combines uptime monitoring, incident management, and log management in one platform |
Position
Better Stack started as an uptime monitoring tool (competing with Pingdom, UptimeRobot) and expanded into log management. Their strength is the unified incident workflow: uptime check fails → alert fires → on-call gets paged → logs are right there for debugging. Clean UI, developer-friendly.
Limitation: still primarily uptime + logs. Not a full observability platform (no APM, limited traces, basic metrics). Good for small-to-mid teams that want simple, affordable monitoring without the complexity of Datadog.
14. 13. HyperDX / ClickStack
| Founded | 2023 |
|---|---|
| Renamed to | ClickStack (early 2025) |
| GitHub stars | 7K+ |
| License | MIT |
| Backend | ClickHouse |
| Key claim | “Open-source Datadog alternative” |
Position
Very similar to SigNoz in approach: unified platform, ClickHouse backend, OTel-native, MIT licensed. The rebrand from HyperDX to ClickStack leans into the ClickHouse connection. Differentiators include session replay integration and a focus on developer experience.
Earlier stage than SigNoz with a smaller community. Interesting as a validation of the “unified ClickHouse observability” thesis. Both SigNoz and ClickStack demonstrate that ClickHouse + OTel is the emerging open-source stack for full observability.
15. 14. Uptrace
| Type | Open-source observability platform |
|---|---|
| Backend | ClickHouse |
| License | BSL (Business Source License) |
| Focus | OTel-native tracing and metrics |
| Cloud pricing | Starting at $1/month per 100K spans |
Position
Another ClickHouse-backed, OTel-native platform. Smaller community than SigNoz. BSL license is less permissive than MIT. Primarily interesting as further validation of the ClickHouse + OTel stack, but less likely to win against SigNoz which has more momentum, better funding (YC), and MIT licensing.
16. 15. Checkly
| Founded | 2018 |
|---|---|
| Focus | Synthetic monitoring & monitoring-as-code |
| Funding | $12.6M |
| Key feature | Playwright-based browser checks defined in code |
| Pricing | Free tier (50 API checks), Starter at $30/month |
Position
Checkly focuses on a specific slice of observability: synthetic monitoring (are your APIs and web apps working from the user’s perspective?). “Monitoring as code” approach fits into CI/CD workflows. Competes more with Pingdom and Datadog Synthetics than with full observability platforms. Important as an example of how focused tools can carve out a niche.
17. 16. Sentry
| Founded | 2015 (project started 2008) |
|---|---|
| Revenue | $100M+ ARR |
| Funding | $217M total |
| Focus | Error tracking & performance monitoring |
| License | BSL (was Apache 2.0) |
| Pricing | Free tier (5K errors/month), Team at $26/month |
Position
Sentry occupies the “application-level” monitoring niche: error tracking, crash reporting, and performance monitoring (transaction tracing). Not a full infrastructure observability platform, but deeply integrated into developer workflows. Session replay, profiling, and release health tracking make it complementary to (not a replacement for) Datadog/Grafana.
Most teams use Sentry alongside their observability platform, not instead of it. Developer experience is excellent — SDKs for every language with automatic error grouping and stack trace deobfuscation.
18. 17. Competitive Comparison Table
| Platform | Type | Backend | OTel Native | Self-Hosted | License | Revenue/Funding |
|---|---|---|---|---|---|---|
| Datadog | Full platform | Proprietary | Accepts OTel | No | Proprietary | $3.2B ARR |
| Grafana Labs | Full stack (LGTM) | Multiple (Loki, Mimir, Tempo) | Yes | Yes | AGPL | $400M+ ARR |
| SigNoz | Full platform | ClickHouse | Yes (built on OTel) | Yes | MIT | $6.5M raised |
| New Relic | Full platform | NRDB | Accepts OTel | No | Proprietary | ~$960M ARR |
| Dynatrace | Full platform | Grail | Accepts OTel | No | Proprietary | $1.7B ARR |
| Elastic | Search + Observability | Elasticsearch | Accepts OTel | Partial | SSPL/ELv2 | $1.3B ARR |
| Splunk | Log analytics + SIEM | Proprietary | Accepts OTel | Yes (on-prem) | Proprietary | $28B acquisition |
| Honeycomb | Event analytics | Custom columnar | Yes | No | Proprietary | $97M raised |
| Axiom | Log analytics | Object storage (S3) | Accepts OTel | No | Proprietary | $27M raised |
| Better Stack | Uptime + Logs | Custom | Accepts OTel | No | Proprietary | Venture-backed |
| ClickStack | Full platform | ClickHouse | Yes | Yes | MIT | Early stage |
| Uptrace | Tracing + Metrics | ClickHouse | Yes | Yes | BSL | Bootstrapped |
| Checkly | Synthetic monitoring | Custom | Accepts OTel | No | Proprietary | $12.6M raised |
| Sentry | Error tracking | Custom (Snuba/ClickHouse) | Partial | Yes | BSL | $100M+ ARR |
19. 18. Pricing Comparison
Estimated monthly cost for a typical mid-size setup: 50 hosts, 500GB logs/month, 200GB traces/month, 50K metrics series.
| Platform | Host/Infra Fees | Log Cost | Trace Cost | Metric Cost | Estimated Total |
|---|---|---|---|---|---|
| Datadog | $1,150 (infra) + $2,000 (APM) | $900+ (ingest + index) | Included in APM | Custom metrics extra | $4,000–$6,000+ |
| Grafana Cloud | None | $250 (500GB × $0.50) | $100 (200GB × $0.50) | $400 (50K series) | $750–$1,000 |
| SigNoz Cloud | None | $150 (500GB × $0.30) | $60 (200GB × $0.30) | $50 | $260–$400 |
| New Relic | None | $210 (700GB × $0.30) + per-user fees | $500–$1,500 | ||
| Axiom | None | $175 (500GB × $0.35) | Limited | Limited | $200–$500 |
| Self-hosted (Grafana/SigNoz) | Infrastructure cost only | $0 software cost + cloud compute/storage | $200–$800 (cloud infra) | ||
Key insight: SigNoz is 10–15x cheaper than Datadog for equivalent functionality. Grafana Cloud is 4–6x cheaper. Self-hosted is cheapest but adds operational burden. The pricing gap is real and growing — this is the primary attack vector for challengers.
20. 19. How to Compete as a Bootstrapper
The Landscape Reality
Building a general-purpose “Datadog competitor” as a bootstrapper is extremely difficult — SigNoz has raised $6.5M, Grafana Labs $694M, and incumbents have thousands of engineers. However, there are specific wedges that remain viable:
Strategy 1: Vertical Observability
Build observability for a specific stack or use case that general platforms serve poorly:
- WordPress/PHP monitoring: Most observability tools are built for cloud-native. The massive WordPress ecosystem has no purpose-built observability tool.
- IoT/Edge observability: Devices with intermittent connectivity, constrained resources. None of the major platforms handle this well.
- Database-specific monitoring: Deep PostgreSQL or MySQL monitoring with query analysis, index recommendations, and performance tuning. PgAnalyze shows this works ($5M+ ARR, bootstrapped feel).
- Serverless-native: Lambda/Cloudflare Workers monitoring. Lumigo and Epsagon (acquired) proved the market. Current tools still struggle with cold starts, cost correlation, and function-level debugging.
Strategy 2: Opinionated “Good Enough” Platform
The Better Stack approach: combine uptime monitoring + logs + alerting + incident management into one clean, affordable tool. Don’t try to compete with Datadog on features — compete on simplicity and price.
- Target: teams of 5–50 developers who don’t need 700 integrations
- Price: flat $49–$199/month (predictable, no usage surprises)
- Differentiator: 5-minute setup, beautiful UI, zero configuration decisions
- Tech stack: ClickHouse + OTel collector + simple web UI
Strategy 3: Cost Optimization Layer
Don’t replace Datadog — reduce the bill. Build a tool that:
- Analyzes Datadog/Grafana/New Relic usage and identifies waste
- Recommends log sampling strategies, metric cardinality reduction
- Provides a “Datadog bill estimator” before you get the surprise invoice
- Revenue model: charge 10–20% of savings identified
This is the observability equivalent of cloud cost management (Vantage, CloudHealth). Real pain point, clear ROI, and you’re selling to the exact budget holder who’s frustrated about observability costs.
Strategy 4: OTel Pipeline Management
OpenTelemetry Collector pipelines are becoming complex. Teams need to route, filter, sample, and transform telemetry data before it hits their backend (to control costs). Build a visual OTel pipeline builder:
- Drag-and-drop OTel Collector configuration
- Built-in sampling strategies (tail-based, head-based, adaptive)
- Cost estimation per pipeline configuration
- Multi-backend routing (send logs to cheap storage, traces to SigNoz, metrics to Prometheus)
This is infrastructure plumbing that every OTel-adopting team needs but nobody wants to build in-house. BindPlane (acquired by observIQ/Google), Cribl, and Mezmo are in this space but are enterprise-focused.
Strategy 5: Self-Hosted SaaS (Managed Open Source)
Offer managed SigNoz/Grafana stack deployed in the customer’s own cloud account:
- Data never leaves their VPC (compliance teams love this)
- They pay cloud costs directly (transparent, no markup confusion)
- You charge a management fee ($500–$5,000/month)
- Handle upgrades, scaling, backups, ClickHouse tuning
This is the Aiven/Elestio model applied to observability. Works especially well for healthcare, finance, and government sectors with strict data residency requirements.
The DHH/37signals Filter
Applying the “build for yourself” test: if you run infrastructure and are frustrated by your Datadog bill, build the simpler alternative you actually want. The best wedge for a solo founder or small team:
- Pick Strategy 2 or 3 — either build the “good enough” all-in-one or the cost optimizer
- Target a specific audience — Rails developers, Laravel developers, or Next.js/Vercel users
- Price simply — flat monthly fee, no per-GB or per-host pricing
- Ship fast — ClickHouse + OTel Collector + clean UI is a viable MVP in 2–3 months
- Content marketing — “I cut my Datadog bill by 90%” posts drive incredible organic traffic
Bottom line: The observability market is massive and the incumbents are genuinely disliked for their pricing. OpenTelemetry and ClickHouse have commoditized the hard infrastructure. The opportunity is in packaging, pricing, and targeting — not in building better technology. A focused, opinionated tool with simple pricing can build a profitable $1–10M business by capturing even 0.01% of Datadog’s frustrated customers.