CRUMB a card from devarno-cloud

Kubernetes Deployment Topology

nestr intermediate 5 min read

ELI5

Perch is a small city deployed by one Helm chart: pillars that need disk (Prometheus, Loki, AlertManager) live in StatefulSets, the rest in Deployments, traffic between buildings is gated by NetworkPolicies, and ServiceMonitors are the postal-discovery system that finds new houses (Engines) automatically.

Technical Deep Dive

Single chart at perch/helm, namespace perch-system. Templates render the following workloads:

flowchart TB
subgraph perch-system["namespace: perch-system"]
direction TB
subgraph Stateful["StatefulSets"]
P[prometheus<br/>2 replicas]
AM[alertmanager<br/>3 replicas HA]
LO[loki]
end
subgraph Stateless["Deployments"]
G[grafana]
J[jaeger]
BR[bridge]
CM[cost-monitor]
ST[slo-tracker]
PE[policy-enforcer]
TC[trace-correlator]
TQ[thanos-query]
end
subgraph DaemonSet
PT[promtail]
end
SM[ServiceMonitor CRDs]
end
SM -.discover.-> P
PT -->|tail logs| LO
P --> TQ
P --> G
LO --> G
J --> G
BR --> P
CM --> P
ST --> P
PE --> ST
TC --> J
TC --> LO

Workload Reference

WorkloadKindReplicasNotes
prometheusStatefulSet2scrape interval 15 s, retention 30 d, PVCs per replica
alertmanagerStatefulSet3gossip cluster for HA
lokiStatefulSetconfigurablelog store backend for Promtail
promtailDaemonSetper nodetails container logs into Loki
grafanaDeployment2SSO + RBAC for dashboards
jaegerDeployment2trace ingest + query
thanos-queryDeployment2aggregates Prometheus replicas, S3 backend for long-term
bridgeDeployment1federates nestr_* to Folio (nestr-009)
cost-monitor, slo-tracker, policy-enforcer, trace-correlatorDeployment1 eachsmall Go services (nestr-010)

Discovery

ServiceMonitors live under perch/k8s/ (one per scrape target: api, shield, relay, aria, manuscript, printery). They select Engine pods by label and let the Prometheus operator generate scrape configs without redeploys.

RBAC & NetworkPolicies

perch/k8s/rbac/ defines a ClusterRole + ClusterRoleBinding + ServiceAccount so Prometheus can list/watch pods cluster-wide. perch/deployment/production/network-policies.yaml restricts ingress/egress: only Grafana exposes externally, custom services accept traffic only from Prometheus and from each other on documented ports.

Long-term Storage

Thanos sidecar pattern (assumed from thanos-query template — sidecar template not enumerated in this revision): Prometheus replicas write blocks to S3-compatible storage; thanos-query fans queries out across replicas and historical S3 blocks. Retention is “infinite” at the object-store layer; hot retention stays at 30 d in Prometheus.

Key Terms

  • ServiceMonitor → Prometheus-Operator CRD that turns a label selector into a scrape config.
  • HA cluster → AlertManager peers gossip alert state to deduplicate paging across replicas.
  • Sidecar → Thanos pattern co-locating an uploader next to each Prometheus replica for S3 offload.

Q&A

Q: Why three AlertManager replicas and not two? A: A two-node cluster cannot tolerate any failure under quorum semantics; three tolerates one. Pages stay deduplicated only while quorum holds.

Q: How does adding a new Engine instance get scraped? A: It carries the labels matched by an existing ServiceMonitor; Prometheus-Operator regenerates the scrape config on the next reconcile. No chart change required.

Q: Where does long-term metric data live? A: In the S3 backend behind Thanos. Prometheus PVCs are sized for 30 d hot data; older queries are fanned out to S3 via thanos-query.

Examples

A green-field install: kubectl create ns perch-system && helm install perch ./helm -n perch-system. Wait for prometheus-0, prometheus-1, all three alertmanager-* and grafana-* to reach Ready, then kubectl port-forward svc/perch-grafana 3000:80 to land on the default dashboards already wired against nestr_* and orchestrator_* series.

neighbors on the map