Docker Compose & Observability Stack
iris intermediate 6 min read
ELI5
The IRIS Docker stack is like a fully equipped hospital. The main building (iris-service) does the actual work. It connects to a filing cabinet (PostgreSQL) for long-term records, a security camera system (Jaeger) that tracks every person’s movements, a dashboard (Grafana) showing vital signs in real-time, a logbook (Loki) recording every conversation, and a central nurse station (OpenTelemetry Collector) that routes all information to the right places.
Technical Deep Dive
7-Service Docker Compose Stack
flowchart TB subgraph network["iris-network"] iris_app["iris-service FastAPI<br/>Port 8000"] db["PostgreSQL 16<br/>5433 → 5432"] otel["OTel Collector<br/>gRPC 4317 / HTTP 4318 / Metrics 8888"] jaeger["Jaeger All-in-One<br/>16687 → 16686"] prometheus["Prometheus<br/>9091 → 9090"] grafana["Grafana<br/>3001 → 3000"] loki["Loki<br/>3101 → 3100"] end iris_app --> db iris_app --> otel otel --> jaeger otel --> prometheus grafana --> prometheus grafana --> jaeger grafana --> lokiService Details
| Service | Image | Host Port | Purpose | Depends On |
|---|---|---|---|---|
db | postgres:16-alpine | 5433 → 5432 | Primary data store | — |
otel-collector | otel/opentelemetry-collector:0.103.0 | 4317, 4318, 8888 | Telemetry routing pipeline | jaeger |
jaeger | jaegertracing/all-in-one:latest | 16687 → 16686 | Distributed tracing UI | — |
prometheus | prom/prom/prometheus:latest | 9091 → 9090 | Metrics storage (15-day retention) | — |
grafana | grafana/grafana:latest | 3001 → 3000 | Dashboards and visualisation | prometheus, jaeger, loki |
loki | grafana/loki:latest | 3101 → 3100 | Structured log aggregation | — |
iris-service | Built from ./Dockerfile | 8000 | Core FastAPI application | All above (health check) |
Data Flow
flowchart LR A["iris-service"] -->|Traces + Metrics<br/>OTLP gRPC| B["OTel Collector"] A -->|Logs<br/>File/Console| C["Loki"] B -->|Traces| D["Jaeger"] B -->|Metrics| E["Prometheus"] E --> F["Grafana"] D --> F C --> F A -->|Data| G["PostgreSQL"]Dependency Chain
flowchart TD A["iris-service"] -->|condition: service_healthy| B["db"] A -->|condition: service_healthy| C["otel-collector"] A -->|condition: service_healthy| D["jaeger"] A -->|condition: service_healthy| E["prometheus"] A -->|condition: service_healthy| F["grafana"] C -->|depends_on| D F -->|reads from| E F -->|reads from| D F -->|reads from| G["loki"]The iris-service container waits for all 5 infrastructure services to report healthy before starting. This ensures the database, telemetry pipeline, and observability backends are ready.
PostgreSQL Configuration
db: image: postgres:16-alpine environment: POSTGRES_USER: iris POSTGRES_PASSWORD: iris POSTGRES_DB: iris_db LANG: en_US.utf8 volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U iris -d iris_db"] interval: 5s timeout: 5s retries: 5Note: While PostgreSQL is provisioned, iris-service currently uses in-memory registries. The SQLAlchemy ORM models and async database infrastructure are defined but init_db() is a placeholder.
Volumes
| Volume | Service | Purpose |
|---|---|---|
postgres_data | db | Persistent database storage |
prometheus_data | prometheus | Metrics time-series data |
grafana_storage | grafana | Dashboards and user preferences |
loki_data | loki | Log index and chunks |
Network
All 7 services share a single Docker bridge network: iris-network. This enables DNS-based service discovery (e.g., db:5432, otel-collector:4317).
Key Terms
- OTel Collector → Central telemetry pipeline receiving OTLP traces/metrics and routing to backends
- Jaeger → Distributed tracing backend for visualising request flows across services
- Prometheus → Time-series metrics database with 15-day retention
- Grafana → Dashboard platform reading from Prometheus, Jaeger, and Loki
- Loki → Log aggregation system optimised for structured logs from containers
- Service health check → Docker condition ensuring dependent services are ready before starting
- OTLP → OpenTelemetry Protocol; gRPC-based transport for traces and metrics
Q&A
Q: Why does iris-service expose port 8000 but db uses 5433? A: Port 5433 avoids conflicts with local PostgreSQL installations (which typically use 5432). Similarly, 9091, 3001, 3101, and 16687 avoid conflicts with locally running Prometheus, Grafana, Loki, and Jaeger instances.
Q: Is the database actually used?
A: Not yet. The service uses in-memory registries. SQLAlchemy ORM models are defined and the async session factory is wired, but init_db() and close_db() are no-ops. Future versions will persist to PostgreSQL.
Q: How do I view traces?
A: Open http://localhost:16687 for the Jaeger UI. Search by service name (iris-service), operation, or trace ID.
Q: How do I view metrics dashboards?
A: Open http://localhost:3001 for Grafana. Default credentials are admin/admin. Prometheus is pre-configured as a datasource.
Q: What is the OTel sampling rate?
A: Default 10% (OTEL_SAMPLE_RATE=0.1). In production, you might increase this for debugging or decrease it to reduce overhead.
Examples
The Docker stack is like a newsroom:
- iris-service = The reporters writing stories (the actual work)
- PostgreSQL = The filing cabinet storing published articles (permanent records)
- OTel Collector = The editorial desk routing stories to the right departments
- Jaeger = The security footage showing exactly who spoke to whom and when (request traces)
- Prometheus = The analytics team tracking page views, errors, and response times (metrics)
- Grafana = The wall of monitors in the newsroom showing live stats
- Loki = The transcription service recording every phone call and meeting (logs)
neighbors on the map
- PostgreSQL Database Schema (ORM Models) understanding the planned database schema
- FNP Kubernetes Multi-Region Architecture deploying FNP across multiple regions