Observability Benchmarking

A comprehensive framework for benchmarking containerized REST services under the Grafana LGTM observability stack

Project Overview

This project provides a production-ready Docker Compose environment for benchmarking REST service implementations while collecting comprehensive telemetry data including logs, metrics, traces, and CPU profiles.

High Performance

Benchmark results up to 65,000 RPS on CPU-limited containers (2 vCPUs)

Full Observability

Complete LGTM stack: Loki, Grafana, Tempo, Mimir with Pyroscope profiling

Multiple Frameworks

Spring Boot, Quarkus, Micronaut, Helidon, Spark, Javalin, Dropwizard, Vert.x, Pekko, Go, and Django (Python) — JVM, native & interpreted

Thread Models

Compare platform threads, virtual threads, and reactive programming

Reproducible

Deterministic load generation with wrk2 and containerized environment

Containerized

Complete Docker Compose orchestration for all services and tools

System Architecture

A modern, cloud-native architecture demonstrating industry best practices in observability and performance engineering.

Service Layer

Spring Boot Services
Quarkus Services
Micronaut Services
Helidon Services
Spark Services
Javalin Services
Dropwizard Services
Vert.x Services
Pekko Services
Go Services
Django Services

Collection Layer

OpenTelemetry
Grafana Alloy
Pyroscope Agent

Storage & Analysis Layer

Loki (Logs)
Tempo (Traces)
Mimir (Metrics)
Pyroscope (Profiles)

Visualization Layer

Grafana Dashboards

Load Generation Layer

wrk2 Load Generator

Control Layer

Orchestration, Healthcheck

Key Design Decisions

  • OpenTelemetry Integration: Standardized instrumentation across all services using OTLP over gRPC
  • Batched Telemetry: Optimized batch processing to minimize overhead on services under test
  • Resource Isolation: CPU-limited containers ensure fair comparison across implementations
  • Multidimensional Profiling: Combined eBPF and agent-based profiling for comprehensive insights
  • Deterministic Testing: wrk2 provides fixed-rate load generation for reproducible results

Benchmark Results

At-a-glance results [17/05/2026] [2 vCPUs] Performance comparison of different frameworks and concurrency models on CPU-limited containers ordered by RPS, Peak Mem, and Image Size.

Elite (40k+) High (25k–39k) Mid (15k–24k) Entry (<15k)
# Framework Runtime Mode RPS Peak Mem (MB) Image Size (MB)
Elite Tier — 40,000+ RPS
🥇 Helidon SE JVM Virtual 65,000 430 178
🥈 Vert.x JVM Reactive 52,000 541 222
🥉 Quarkus JVM Reactive 49,000 540 237
4 Quarkus JVM Virtual 45,000 540 237
High Tier — 25,000–39,000 RPS
5 Micronaut JVM Virtual 38,000 441 195
6 Helidon SE Native Virtual 37,000 195 258
7 Quarkus JVM Platform 37,000 540 237
8 Spark JVM Platform 35,000 559 216
9 Micronaut JVM Reactive 33,000 441 195
10 Micronaut JVM Platform 31,000 441 195
11 Pekko JVM Reactive 30,000 693 267
12 Javalin JVM Platform 29,000 754 221
13 Quarkus Native Virtual 27,000 270 623
14 Javalin JVM Virtual 26,000 510 221
15 Spark JVM Virtual 25,000 395 216
Mid Tier — 15,000–24,000 RPS
16 Go Native Goroutines 24,000 120 37
17 Quarkus Native Reactive 22,000 270 623
18 Quarkus Native Platform 21,000 270 623
19 Spring JVM Platform 21,000 552 248
20 Micronaut Native Virtual 17,000 165 343
21 Micronaut Native Platform 17,000 165 343
22 Spring JVM Virtual 17,000 439 248
23 Dropwizard JVM Platform 17,000 613 247
24 Dropwizard JVM Virtual 16,000 529 247
25 Micronaut Native Reactive 15,000 165 343
26 Helidon MP JVM Virtual 15,000 463 196
Entry Tier — Below 15,000 RPS
27 Spring JVM Reactive 14,000 427 278
28 Spring Native Virtual 11,000 163 388
29 Helidon MP Native Virtual 10,000 202 362
30 Spring Native Platform 10,000 237 388
31 Spring Native Reactive 7,000 176 431
32 Django CPython Platform 1,000 161 320
33 Django CPython Reactive 700 200 323

See Benchmarking Methodology for reproducibility details and interpretation.

Key Insights

Reactive Advantage

Quarkus reactive implementation shows exceptional throughput under fixed-rate load

Virtual Threads

Java virtual threads provide strong throughput with a simpler concurrency model

Native vs JVM

Native images offer faster startup; the JVM can deliver higher peak throughput

Instrumentation Matters

All results here assume a comparable observability pipeline (OTel + LGTM + profiling)

Resources