System Architecture

Overview

The Observability Benchmarking project is designed as a modular, cloud-native system that demonstrates modern software engineering practices. The architecture follows a layered approach with clear separation of concerns.

Architectural Principles

1. Cloud-Native Design

Containerization First: Every component runs in a container for consistency and portability
Declarative Configuration: Infrastructure defined as code using Docker Compose
12-Factor App Compliance: Configuration via environment variables, stateless services, port binding

2. Observability-First Approach

Telemetry from Day One: All services instrumented with OpenTelemetry from the start
Comprehensive Coverage: Logs, metrics, traces, and profiles collected for every service
Correlation: Ability to correlate data across all telemetry signals

3. Performance Engineering

Reproducible Benchmarks: Fixed-rate load generation with deterministic results
Resource Isolation: CPU and memory limits ensure fair comparisons
Statistical Rigor: Multiple runs and warm-up periods for accurate measurements

System Components

Load Generation Layer

wrk2 - HTTP benchmarking tool providing constant-throughput load generation

Deterministic rate limiting (requests per second)
Latency distribution tracking
Configurable connection pooling
Thread-based concurrency control

Service Layer

REST Services - Multiple implementations for comparison

Spring Boot 4.0 (JVM and Native)
- Platform threads (traditional thread pool)
- Virtual threads (Project Loom)
- Reactive (WebFlux)
Quarkus 3.30 (JVM and Native)
- Platform threads
- Virtual threads
- Reactive (Mutiny)
Go 1.25
- Fiber framework

Service Characteristics:

Simple cache retrieval workload (Caffeine)
Non-blocking I/O where applicable
OpenTelemetry instrumentation
Health check endpoints
Configurable heap and thread settings

Collection Layer

Grafana Alloy - OpenTelemetry collector and distributor

OTLP receiver (gRPC and HTTP)
Batch processing for efficiency
Service discovery
eBPF-based profiling

Pyroscope Java Agent - Profiling agent for JVM services

CPU profiling
Allocation profiling
Lock contention detection
Integration with OpenTelemetry traces

Storage Layer

Loki - Log aggregation system

Label-based indexing
Efficient log storage
LogQL query language
Integration with Grafana

Tempo - Distributed tracing backend

Trace ID-based storage
TraceQL query language
Tag-based search
Trace-to-metrics correlation

Mimir - Metrics storage (Prometheus-compatible)

Long-term metric storage
PromQL query engine
High-cardinality support
Exemplar support

Pyroscope - Continuous profiling storage

Flame graph generation
Profile aggregation
Tag-based filtering
Profile-to-trace correlation

Visualization Layer

Grafana - Unified observability platform

Dashboard provisioning
Data source configuration
Explore interface
Alerting (future)

Data Flow

Telemetry Pipeline

Service → OpenTelemetry SDK → OTLP/gRPC → Alloy → {Loki, Tempo, Mimir}
                                                  ↓
                                              Pyroscope
                                                  ↓
                                              Grafana

Profiling Pipeline (Java)

JVM Service → Pyroscope Agent → Pyroscope Server
                     ↓
            OpenTelemetry Context
                     ↓
                  Alloy

eBPF Profiling Pipeline

Container → Alloy (eBPF) → Pyroscope Server

Network Architecture

Service Communication

Services expose REST endpoints on configurable ports
All services in the same Docker network
Service discovery via Docker DNS

Observability Communication

OTLP over gRPC (preferred, lower overhead)
HTTP fallback for compatibility
Push-based telemetry (services push to Alloy)
Pull-based profiling (Pyroscope scrapes endpoints)

Resource Management

CPU Allocation

Benchmarked services: 4 vCPU limit
Observability stack: Unlimited (to avoid measurement bias)
Host: Minimum 8 cores recommended

Memory Allocation

Spring JVM: 512MB-1GB heap
Quarkus JVM: 256MB-512MB heap
Native images: 256MB max
Observability services: Per-component tuning

Storage

Ephemeral by default for clean runs
Volume mounts available for persistence
Results exported to host filesystem

Configuration Management

Environment Variables

.env file for global configuration
Service-specific overrides in docker-compose
Runtime configuration (heap size, thread count, etc.)
Observability endpoints and credentials

Docker Compose Profiles

OBS: Observability stack only
SERVICES: REST services
RAIN_FIRE: Load generators
Composable for different scenarios

Security Considerations

Network Isolation

No external network access for services under test
Observability stack accessible on localhost only
Production deployment requires additional security

Credentials

Default credentials for local development only
Environment variable override support
Secrets management recommended for production

Container Security

Non-root users where applicable
Minimal base images
Regular dependency updates

Scalability Considerations

Horizontal Scaling

Services designed to be stateless
Load balancing ready
Kubernetes manifests planned

Vertical Scaling

Configurable resource limits
JVM tuning parameters exposed
GC algorithm selection

Monitoring the Monitors

Alloy Metrics

Processing latency
Batch sizes
Error rates

Storage Metrics

Ingestion rate
Query performance
Storage utilization

Future Architecture Enhancements

Planned Improvements

Kubernetes Support: Helm charts and operators
gRPC - HTTP/2 - HTTP/3: Additional protocols
CI/CD Integration: Automated benchmark runs
Multi-region: Distributed load generation
Service Mesh: Istio/Linkerd integration
Chaos Engineering: Fault injection
Cost Analysis: Resource utilization tracking

Extensibility Points

Plugin architecture for new services
Custom metrics exporters
Dashboard templates
Report generators

Technology Choices

Why Spring Boot?

Industry standard for enterprise Java
Extensive ecosystem
Multiple threading models
Good baseline for comparison

Why Quarkus?

Cloud-native optimization
Fast startup time
Low memory footprint
Native compilation support

Why Grafana LGTM Stack?

Integrated observability
Open source
Production-ready
Active community

Why Docker Compose?

Simple local development
Reproducible environments
Easy to understand
Good foundation for Kubernetes migration

Design Trade-offs

Performance vs. Observability

Instrumentation adds overhead
Batching reduces impact
Agent-based profiling optional
Configurable sampling rates

Simplicity vs. Realism

Simple cache workload for controlled testing
Focus on concurrency behavior
Not representative of all workloads
Easy to understand and modify

Local vs. Production

Local development optimized
Production patterns demonstrated
Additional hardening needed for production
Good learning environment