Observability Benchmarking

A comprehensive framework for benchmarking containerized REST services under the Grafana LGTM observability stack

Project Overview

This project provides a production-ready Docker Compose environment for benchmarking REST service implementations while collecting comprehensive telemetry data including logs, metrics, traces, and CPU profiles.

High Performance

Benchmark results up to 65,000 RPS on CPU-limited containers (2 vCPUs)

Full Observability

Complete LGTM stack: Loki, Grafana, Tempo, Mimir with Pyroscope profiling

Multiple Frameworks

Spring Boot, Quarkus, Micronaut, Helidon, Spark, Javalin, Dropwizard, Vert.x, Pekko, Go, and Django (Python) — JVM, native & interpreted

Thread Models

Compare platform threads, virtual threads, and reactive programming

Reproducible

Deterministic load generation with wrk2 and containerized environment

Containerized

Complete Docker Compose orchestration for all services and tools

System Architecture

A modern, cloud-native architecture demonstrating industry best practices in observability and performance engineering.

Service Layer

Spring Boot Services
Quarkus Services
Micronaut Services
Helidon Services
Spark Services
Javalin Services
Dropwizard Services
Vert.x Services
Pekko Services
Go Services
Django Services

Collection Layer

OpenTelemetry
Grafana Alloy
Pyroscope Agent

Storage & Analysis Layer

Loki (Logs)
Tempo (Traces)
Mimir (Metrics)
Pyroscope (Profiles)

Visualization Layer

Grafana Dashboards

Load Generation Layer

wrk2 Load Generator

Control Layer

Orchestration, Healthcheck

Key Design Decisions

  • OpenTelemetry Integration: Standardized instrumentation across all services using OTLP over gRPC
  • Batched Telemetry: Optimized batch processing to minimize overhead on services under test
  • Resource Isolation: CPU-limited containers ensure fair comparison across implementations
  • Multidimensional Profiling: Combined eBPF and agent-based profiling for comprehensive insights
  • Deterministic Testing: wrk2 provides fixed-rate load generation for reproducible results

Benchmark Results

At-a-glance results [06/03/2026] [2 vCPUs] Performance comparison of different frameworks and concurrency models on CPU-limited containers ordered by RPS, Peak Mem, and Image Size.

Elite (40k+) High (25k–39k) Mid (15k–24k) Entry (<15k)
# Framework Runtime Mode RPS Peak Mem (MB) Image Size (MB)
Elite Tier — 40,000+ RPS
🥇 Helidon SE JVM Virtual 65,000 430 169
🥈 Vert.x JVM Reactive 52,000 541 220
🥉 Quarkus JVM Reactive 49,000 540 235
4 Quarkus JVM Virtual 45,000 540 235
High Tier — 25,000–39,000 RPS
5 Micronaut JVM Virtual 38,000 441 193
6 Helidon SE Native Virtual 37,000 195 253
7 Quarkus JVM Platform 37,000 540 235
8 Spark JVM Platform 35,000 559 216
9 Micronaut JVM Reactive 33,000 441 193
10 Micronaut JVM Platform 31,000 441 193
11 Pekko JVM Reactive 30,000 693 266
12 Javalin JVM Platform 29,000 754 219
13 Quarkus Native Virtual 27,000 270 636
14 Javalin JVM Virtual 26,000 510 219
15 Spark JVM Virtual 25,000 395 216
Mid Tier — 15,000–24,000 RPS
16 Go Native Goroutines 24,000 120 36
17 Quarkus Native Reactive 22,000 270 636
18 Quarkus Native Platform 21,000 270 636
19 Spring JVM Platform 21,000 552 246
20 Micronaut Native Virtual 17,000 165 349
21 Micronaut Native Platform 17,000 165 349
22 Spring JVM Virtual 17,000 439 246
23 Dropwizard JVM Platform 17,000 613 246
24 Dropwizard JVM Virtual 16,000 529 246
25 Micronaut Native Reactive 15,000 165 349
26 Helidon MP JVM Virtual 15,000 463 189
Entry Tier — Below 15,000 RPS
27 Spring JVM Reactive 14,000 427 277
28 Spring Native Virtual 11,000 163 388
29 Helidon MP Native Virtual 10,000 202 356
30 Spring Native Platform 10,000 237 388
31 Spring Native Reactive 7,000 176 447
32 Django CPython Platform 1,000 161 306
33 Django CPython Reactive 700 200 309

See Benchmarking Methodology for reproducibility details and interpretation.

Key Insights

Reactive Advantage

Quarkus reactive implementation shows exceptional throughput under fixed-rate load

Virtual Threads

Java virtual threads provide strong throughput with a simpler concurrency model

Native vs JVM

Native images offer faster startup; the JVM can deliver higher peak throughput

Instrumentation Matters

All results here assume a comparable observability pipeline (OTel + LGTM + profiling)

Resources