Observability Benchmarking

Project Overview

This project provides a production-ready Docker Compose environment for benchmarking REST service implementations while collecting comprehensive telemetry data including logs, metrics, traces, and CPU profiles.

High Performance

Benchmark results up to 65,000 RPS on CPU-limited containers (2 vCPUs)

Full Observability

Complete LGTM stack: Loki, Grafana, Tempo, Mimir with Pyroscope profiling

Multiple Frameworks

Spring Boot, Quarkus, Micronaut, Helidon, Spark, Javalin, Dropwizard, Vert.x, Pekko, Go, and Django (Python) — JVM, native & interpreted

Thread Models

Compare platform threads, virtual threads, and reactive programming

Reproducible

Deterministic load generation with wrk2 and containerized environment

Containerized

Complete Docker Compose orchestration for all services and tools

System Architecture

A modern, cloud-native architecture demonstrating industry best practices in observability and performance engineering.

Service Layer

Spring Boot Services

Quarkus Services

Micronaut Services

Helidon Services

Spark Services

Javalin Services

Dropwizard Services

Vert.x Services

Pekko Services

Go Services

Django Services

Collection Layer

OpenTelemetry

Grafana Alloy

Pyroscope Agent

Storage & Analysis Layer

Loki (Logs)

Tempo (Traces)

Mimir (Metrics)

Pyroscope (Profiles)

Visualization Layer

Grafana Dashboards

Load Generation Layer

wrk2 Load Generator

Control Layer

Orchestration, Healthcheck

Key Design Decisions

OpenTelemetry Integration: Standardized instrumentation across all services using OTLP over gRPC
Batched Telemetry: Optimized batch processing to minimize overhead on services under test
Resource Isolation: CPU-limited containers ensure fair comparison across implementations
Multidimensional Profiling: Combined eBPF and agent-based profiling for comprehensive insights
Deterministic Testing: wrk2 provides fixed-rate load generation for reproducible results

Benchmark Results

At-a-glance results [17/05/2026] [2 vCPUs] Performance comparison of different frameworks and concurrency models on CPU-limited containers ordered by RPS, Peak Mem, and Image Size.

Elite (40k+) High (25k–39k) Mid (15k–24k) Entry (<15k)

#	Framework	Runtime	Mode	RPS	Peak Mem (MB)	Image Size (MB)
Elite Tier — 40,000+ RPS
🥇	Helidon SE	JVM	Virtual	65,000	430	178
🥈	Vert.x	JVM	Reactive	52,000	541	222
🥉	Quarkus	JVM	Reactive	49,000	540	237
4	Quarkus	JVM	Virtual	45,000	540	237
High Tier — 25,000–39,000 RPS
5	Micronaut	JVM	Virtual	38,000	441	195
6	Helidon SE	Native	Virtual	37,000	195	258
7	Quarkus	JVM	Platform	37,000	540	237
8	Spark	JVM	Platform	35,000	559	216
9	Micronaut	JVM	Reactive	33,000	441	195
10	Micronaut	JVM	Platform	31,000	441	195
11	Pekko	JVM	Reactive	30,000	693	267
12	Javalin	JVM	Platform	29,000	754	221
13	Quarkus	Native	Virtual	27,000	270	623
14	Javalin	JVM	Virtual	26,000	510	221
15	Spark	JVM	Virtual	25,000	395	216
Mid Tier — 15,000–24,000 RPS
16	Go	Native	Goroutines	24,000	120	37
17	Quarkus	Native	Reactive	22,000	270	623
18	Quarkus	Native	Platform	21,000	270	623
19	Spring	JVM	Platform	21,000	552	248
20	Micronaut	Native	Virtual	17,000	165	343
21	Micronaut	Native	Platform	17,000	165	343
22	Spring	JVM	Virtual	17,000	439	248
23	Dropwizard	JVM	Platform	17,000	613	247
24	Dropwizard	JVM	Virtual	16,000	529	247
25	Micronaut	Native	Reactive	15,000	165	343
26	Helidon MP	JVM	Virtual	15,000	463	196
Entry Tier — Below 15,000 RPS
27	Spring	JVM	Reactive	14,000	427	278
28	Spring	Native	Virtual	11,000	163	388
29	Helidon MP	Native	Virtual	10,000	202	362
30	Spring	Native	Platform	10,000	237	388
31	Spring	Native	Reactive	7,000	176	431
32	Django	CPython	Platform	1,000	161	320
33	Django	CPython	Reactive	700	200	323

See Benchmarking Methodology for reproducibility details and interpretation.

Key Insights

Reactive Advantage

Quarkus reactive implementation shows exceptional throughput under fixed-rate load

Virtual Threads

Java virtual threads provide strong throughput with a simpler concurrency model

Native vs JVM

Native images offer faster startup; the JVM can deliver higher peak throughput

Instrumentation Matters

All results here assume a comparable observability pipeline (OTel + LGTM + profiling)