What is the difference between APM and observability?
+
APM (application performance monitoring) is a specific category focused on monitoring application behavior — transaction traces, response times, error rates, and database query performance. Observability is a broader concept that encompasses APM plus infrastructure monitoring, log management, real user monitoring, synthetic testing, and the ability to ask ad-hoc questions about your system's behavior. In practice, most modern 'APM tools' have evolved into observability platforms, but the distinction matters because some platforms started as APM tools and added breadth (New Relic, Dynatrace), while others started as infrastructure or log tools and added APM (Datadog, Elastic). The origin shapes the product's strengths.
Is Datadog APM worth the cost?
+
Datadog APM is excellent in terms of features — the distributed tracing, service map, continuous profiler, and infrastructure correlation are best-in-class. The problem is cost. Datadog's pricing model has multiple dimensions (per host, per span ingestion, per indexed span, per custom metric) that compound quickly. A team monitoring 20 hosts with moderate trace volume can easily spend $2,000-$4,000/month, and bills of $10,000-$50,000/month are common for mid-market companies. Datadog is worth it if you need the depth and breadth of its platform and have the budget. If cost is a primary concern, Grafana Cloud, SigNoz, or New Relic's free tier offer viable alternatives at a fraction of the price.
What is OpenTelemetry and should I use it?
+
OpenTelemetry is a CNCF open-source project that provides a vendor-neutral standard for instrumenting applications and collecting telemetry data (traces, metrics, logs). You should use it for any new instrumentation project. The benefits are clear: instrument once and send data to any compatible backend, avoiding vendor lock-in. The tradeoff is that OTel auto-instrumentation for some languages and frameworks is less mature than proprietary agents — particularly for deep runtime profiling and framework-specific instrumentation. For most teams, the portability benefit outweighs the marginal depth advantage of proprietary agents.
How much does APM cost per month for a typical team?
+
For a team of 5-10 engineers running 20-30 microservices across 15-25 hosts, expect to pay $1,500-$5,000/month for a commercial APM platform. The range is wide because pricing models differ dramatically. Datadog at $46-$54/host/month (APM + infrastructure) for 20 hosts runs $920-$1,080/month before overage. New Relic with 5 Pro users and 500 GB/month of data runs approximately $1,900-$2,100/month. SigNoz Cloud with the same data volume would be $150-$300/month. Grafana Cloud falls between SigNoz and Datadog. The self-hosted open-source options (SigNoz, Grafana stack) cost $0 in licensing but require 10-20 hours/month of engineering time to maintain — at $200/hour, that is $2,000-$4,000/month in labor.
Can I use APM for monolithic applications, or is it only for microservices?
+
APM provides value for monolithic applications, though the use case is different. For a monolith, APM focuses on transaction tracing within the single application — identifying slow controller actions, database queries, external API calls, and background job performance. You do not need distributed tracing for a monolith because there is only one service. Auto-instrumentation for monolithic frameworks (Rails, Django, Spring MVC) is mature and provides immediate value. The ROI calculation is simpler, too: you are monitoring one application, so the per-host or per-service cost is predictable.
Is Grafana an APM tool?
+
Grafana itself is a visualization and dashboarding platform, not an APM tool. However, Grafana Cloud — the commercial offering from Grafana Labs — includes a full APM stack: Grafana Tempo for distributed tracing, Grafana Mimir for metrics, Grafana Loki for logs, Grafana Pyroscope for continuous profiling, and Grafana Beyla for eBPF-based auto-instrumentation. Together, these components provide a complete APM and observability platform that competes directly with Datadog and New Relic. Grafana Cloud was named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms.
What is the best free APM tool?
+
The best genuinely free APM options in 2026 are: SigNoz (open source, self-hosted, full APM with traces/metrics/logs, no limits beyond your infrastructure), New Relic Free Tier (1 full platform user, 100 GB/month data, full platform access), Grafana Cloud Free Tier (generous allotments for metrics, logs, and traces), and Elastic APM (free with the basic license for self-hosted deployments). SigNoz is the strongest free option if you are willing to self-host and maintain it. New Relic Free Tier is the best option if you want a fully managed experience for a small team. Be cautious of 'free trials' marketed as 'free tiers' — a 14-day trial is not a free APM tool.
How long does it take to set up APM?
+
For a single service with auto-instrumentation, expect 15-60 minutes from sign-up to first traces appearing in the platform. For a full production deployment across 10-50 services, expect 1-4 weeks including agent deployment, sampling configuration, alert setup, and dashboard creation. Enterprise deployments with custom instrumentation, compliance requirements, and multi-team rollout typically take 1-3 months. The instrumentation itself is fast; what takes time is defining your sampling strategy, building meaningful alerts (not just default thresholds), and training the team to use the platform during real incidents.
Should I self-host my APM or use a SaaS platform?
+
Use SaaS unless you have a specific reason not to. Self-hosting APM (running SigNoz, Grafana stack, or Elastic on your own infrastructure) eliminates licensing costs but introduces significant operational overhead: managing Kafka/ClickHouse/Elasticsearch clusters, scaling storage for trace and metric retention, handling upgrades, and maintaining high availability for a system that your entire engineering team depends on during incidents. Self-hosting makes sense if you have strict data sovereignty requirements (government, regulated industries), your data volume makes SaaS prohibitively expensive (petabytes of telemetry per month), or you have a dedicated platform team that can absorb the operational burden.
What is the biggest mistake teams make when adopting APM?
+
The biggest mistake is treating APM as an infrastructure project rather than an engineering culture shift. The platform team deploys agents, configures dashboards, and declares the project complete — but the application developers who would benefit most from APM during debugging never learn to use it. Six months later, leadership asks why the expensive APM platform has not improved MTTR, and the answer is that nobody uses it during incidents. The fix is to involve application developers from day one, make APM the first tool opened during any production investigation (not the last resort), and include 'used APM trace data' as a required element of every incident postmortem.