Kubernetes Deployment Topology

nestr intermediate 5 min read

ELI5

Perch is a small city deployed by one Helm chart: pillars that need disk (Prometheus, Loki, AlertManager) live in StatefulSets, the rest in Deployments, traffic between buildings is gated by NetworkPolicies, and ServiceMonitors are the postal-discovery system that finds new houses (Engines) automatically.

Technical Deep Dive

Single chart at perch/helm, namespace perch-system. Templates render the following workloads:

flowchart TB
    subgraph perch-system["namespace: perch-system"]
        direction TB
        subgraph Stateful["StatefulSets"]
            P[prometheus<br/>2 replicas]
            AM[alertmanager<br/>3 replicas HA]
            LO[loki]
        end
        subgraph Stateless["Deployments"]
            G[grafana]
            J[jaeger]
            BR[bridge]
            CM[cost-monitor]
            ST[slo-tracker]
            PE[policy-enforcer]
            TC[trace-correlator]
            TQ[thanos-query]
        end
        subgraph DaemonSet
            PT[promtail]
        end
        SM[ServiceMonitor CRDs]
    end
    SM -.discover.-> P
    PT -->|tail logs| LO
    P --> TQ
    P --> G
    LO --> G
    J --> G
    BR --> P
    CM --> P
    ST --> P
    PE --> ST
    TC --> J
    TC --> LO

Workload Reference

Workload	Kind	Replicas	Notes
`prometheus`	StatefulSet	2	scrape interval 15 s, retention 30 d, PVCs per replica
`alertmanager`	StatefulSet	3	gossip cluster for HA
`loki`	StatefulSet	configurable	log store backend for Promtail
`promtail`	DaemonSet	per node	tails container logs into Loki
`grafana`	Deployment	2	SSO + RBAC for dashboards
`jaeger`	Deployment	2	trace ingest + query
`thanos-query`	Deployment	2	aggregates Prometheus replicas, S3 backend for long-term
`bridge`	Deployment	1	federates `nestr_*` to Folio (nestr-009)
`cost-monitor`, `slo-tracker`, `policy-enforcer`, `trace-correlator`	Deployment	1 each	small Go services (nestr-010)

Discovery

ServiceMonitors live under perch/k8s/ (one per scrape target: api, shield, relay, aria, manuscript, printery). They select Engine pods by label and let the Prometheus operator generate scrape configs without redeploys.

RBAC & NetworkPolicies

perch/k8s/rbac/ defines a ClusterRole + ClusterRoleBinding + ServiceAccount so Prometheus can list/watch pods cluster-wide. perch/deployment/production/network-policies.yaml restricts ingress/egress: only Grafana exposes externally, custom services accept traffic only from Prometheus and from each other on documented ports.

Long-term Storage

Thanos sidecar pattern (assumed from thanos-query template — sidecar template not enumerated in this revision): Prometheus replicas write blocks to S3-compatible storage; thanos-query fans queries out across replicas and historical S3 blocks. Retention is “infinite” at the object-store layer; hot retention stays at 30 d in Prometheus.

Key Terms

ServiceMonitor → Prometheus-Operator CRD that turns a label selector into a scrape config.
HA cluster → AlertManager peers gossip alert state to deduplicate paging across replicas.
Sidecar → Thanos pattern co-locating an uploader next to each Prometheus replica for S3 offload.

Q&A

Q: Why three AlertManager replicas and not two? A: A two-node cluster cannot tolerate any failure under quorum semantics; three tolerates one. Pages stay deduplicated only while quorum holds.

Q: How does adding a new Engine instance get scraped? A: It carries the labels matched by an existing ServiceMonitor; Prometheus-Operator regenerates the scrape config on the next reconcile. No chart change required.

Q: Where does long-term metric data live? A: In the S3 backend behind Thanos. Prometheus PVCs are sized for 30 d hot data; older queries are fanned out to S3 via thanos-query.

Examples

A green-field install: kubectl create ns perch-system && helm install perch ./helm -n perch-system. Wait for prometheus-0, prometheus-1, all three alertmanager-* and grafana-* to reach Ready, then kubectl port-forward svc/perch-grafana 3000:80 to land on the default dashboards already wired against nestr_* and orchestrator_* series.

neighbors on the map

FNP Kubernetes Multi-Region Architecture deploying FNP across multiple regions
Deployment Topology & Proxy Conflict Resolution setting up a new environment (kitten/cat/lion)
LORE+CAIRNET Deployment Topology & Service Map understanding the LORE deployment architecture
OSS & Cloud Modes self-hosting via docker compose