FNP Cost Optimization & Karpenter
fnp intermediate 6 min read
ELI5
Cloud instances (servers) cost money. Spot instances are like buying airline tickets at the last minute — 80% cheaper, but the airline can cancel anytime. Karpenter is a tool that automatically buys cheap spot instances and replaces them before they’re cancelled, saving FNP 40-50% per month while keeping service reliable.
Technical Deep Dive
Spot vs On-Demand Pricing
| Instance | Cost | Availability | Use in FNP |
|---|---|---|---|
| On-demand | $2/hour | Always | Production NodePool (3-10 pods) |
| Spot | $0.40/hour | 95%+ uptime | Spot NodePool (10-30 pods) |
| Savings | 80% | Trade-off | 40-50% monthly savings |
Monthly cost comparison:
100 pods on-demand: 100 * $2 * 730 hours = $146,000/monthHybrid (30 on-demand + 70 spot): 30 * $2 * 730 = $43,800 70 * $0.40 * 730 = $20,440 Total = $64,240/monthSavings: $146,000 - $64,240 = $81,760/month (56% reduction)Karpenter Consolidation
Karpenter runs every hour:
1. List all pods and nodes2. Calculate: which pods could fit on fewer nodes?3. If consolidation is possible: - Create new nodes (cheaper) - Drain old nodes (graceful pod termination) - Delete empty nodes4. Result: fewer total nodes, lower costConsolidation example:
Before:- Node 1: [pod-a, pod-b] (60% utilized)- Node 2: [pod-c] (20% utilized)- Node 3: [pod-d, pod-e] (55% utilized)
After consolidation:- Node 1: [pod-a, pod-b, pod-c, pod-d] (75% utilized)- Node 2: [pod-e] (pending, will migrate)- Nodes 2 & 3 deleted
Savings: 1/3 fewer nodesSpot Eviction Handling
Scenario: AWS cancels a spot instance (maintenance or demand surge)
1. AWS sends 2-minute termination notice2. Karpenter detects: node marked "cordoned" (no new pods)3. Existing pods drained gracefully: - Send SIGTERM to pods (30-second grace period) - Pods save state to database - Pods terminate4. Karpenter creates replacement pod on new instance5. New pod resumes from saved state
RTO (recovery): ~30 secondsData loss: None (state persisted)NodePool Configuration
Production NodePool (on-demand):
apiVersion: karpenter.sh/v1alpha5kind: NodePoolmetadata: name: productionspec: providerRef: name: on-demand limits: resources: cpu: 20 memory: 100Gi consolidation: enabled: false # Never consolidate productionSpot NodePool (interruptible):
apiVersion: karpenter.sh/v1alpha5kind: NodePoolmetadata: name: spotspec: providerRef: name: spot limits: resources: cpu: 50 memory: 200Gi consolidation: enabled: true # Aggressively consolidate ttlSecondsAfterEmpty: 30 # Delete idle nodes after 30sKey Terms
- Spot instance → Unused cloud capacity sold at discount; can be reclaimed by cloud provider
- Karpenter → Kubernetes-native autoscaler; binpacks pods onto nodes
- Consolidation → Karpenter combines pods onto fewer nodes, deletes empty nodes
- Cordoning → Mark node as “no new pods”; existing pods continue running
Q&A
Q: What if a pod is evicted mid-operation? A: FNP persists state to PostgreSQL every 1-2 seconds. Eviction triggers graceful shutdown (30-second SIGTERM). Pod saves final state, new pod resumes from last checkpoint.
Q: Can critical pods run on spot? A: Yes, if they’re stateless or can quickly recover. Health checks + readiness probes ensure bad pods are replaced. Karpenter respects pod disruption budgets (PDB).
Q: What’s the maximum savings? A: 50-60% is realistic with mixed on-demand + spot. 90% spot would be cheaper but riskier (higher eviction rate). FNP targets 40-50% savings with 99.9% availability.
Examples
Karpenter is like a warehouse manager: buying cheap containers (spot) when demand is high, consolidating inventory hourly to minimize storage costs, and keeping a buffer of expensive permanent containers (on-demand) for critical stock.