High Availability #

Memiliki dua replicas Pod bukan berarti high availability — jika keduanya berada di node yang sama, satu node down berarti kedua Pod hilang. High availability yang nyata membutuhkan distribusi yang disengaja: Pod di beberapa node, node di beberapa availability zone, dan perlindungan terhadap operasi disruptif seperti upgrade node. Artikel ini membahas teknik untuk memastikan aplikasi tetap tersedia meski ada kegagalan sebagian dari infrastruktur.

Mengapa Dua Replicas Belum Cukup #

Skenario: Deployment api, replicas: 2

  Tanpa anti-affinity:
  Node 1: [api-pod-1] [api-pod-2]   ← kedua Pod di node yang sama!
  Node 2: (kosong)

  Node 1 down → kedua Pod hilang → downtime

  Dengan anti-affinity (preferred):
  Node 1: [api-pod-1]
  Node 2: [api-pod-2]               ← satu Pod per node

  Node 1 down → api-pod-1 hilang → api-pod-2 masih melayani
  Kubernetes schedule Pod baru → api-pod-3 di Node 3 atau Node 1 yang pulih

Pod Anti-Affinity #

spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          # Required: HARUS di node berbeda (jika tidak bisa, Pod Pending)
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: api
            topologyKey: kubernetes.io/hostname   # "hostname" = satu per node

          # Preferred: SEBAIKNYA di zone berbeda (lebih fleksibel)
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: api
              topologyKey: topology.kubernetes.io/zone  # satu per zone jika bisa

topologyKey yang umum:
  kubernetes.io/hostname         — satu Pod per node (strict)
  topology.kubernetes.io/zone    — satu Pod per availability zone
  topology.kubernetes.io/region  — satu Pod per region (jarang digunakan)

topologySpreadConstraints: Distribusi yang Lebih Presisi #

topologySpreadConstraints memberikan kontrol lebih granular dari anti-affinity — kamu bisa definisikan berapa maksimum ketidakseimbangan yang diperbolehkan:

spec:
  template:
    spec:
      topologySpreadConstraints:
      # Spread ke zone: maksimum selisih 1 Pod antar zone
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule    # atau ScheduleAnyway
        labelSelector:
          matchLabels:
            app: api

      # Spread ke node: maksimum selisih 2 Pod antar node
      - maxSkew: 2
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway   # lebih fleksibel
        labelSelector:
          matchLabels:
            app: api

Dengan 3 zone (A, B, C) dan 9 replicas:

  Tanpa spread constraints:
  Zone A: [Pod1] [Pod2] [Pod3] [Pod4] [Pod5] [Pod6]
  Zone B: [Pod7] [Pod8]
  Zone C: [Pod9]
  → Tidak merata; zone A down = 67% kapasitas hilang

  Dengan maxSkew: 1:
  Zone A: [Pod1] [Pod2] [Pod3]
  Zone B: [Pod4] [Pod5] [Pod6]
  Zone C: [Pod7] [Pod8] [Pod9]
  → Merata; zone A down = hanya 33% kapasitas berkurang

Pod Disruption Budget (PDB) #

PDB melindungi availability saat operasi disruptif yang disengaja: node drain saat upgrade, Cluster Autoscaler scale-down, atau eviction manual.

# Pastikan minimal 2 Pod api selalu tersedia
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
  namespace: production
spec:
  minAvailable: 2               # atau pakai maxUnavailable
  selector:
    matchLabels:
      app: api

# Alternatif: menggunakan persentase
spec:
  minAvailable: "75%"           # minimal 75% Pod harus tersedia
  # atau:
  maxUnavailable: "25%"         # maksimum 25% Pod boleh tidak tersedia

# Saat node drain: Kubernetes cek PDB sebelum evict Pod
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

# Jika eviction melanggar PDB, drain akan menunggu sampai Pod baru ready
# Output:
# evicting pod production/api-pod-1
# error when evicting pod "api-pod-2":
# Cannot evict pod as it would violate the pod's disruption budget.
# → Kubernetes tunggu sampai Pod pengganti ready sebelum lanjut

Multi-AZ Node Distribution #

Cluster Autoscaler dan node pools harus dikonfigurasi untuk spread ke beberapa AZ:

# GKE: buat cluster dengan multi-zone node pool
gcloud container node-pools create production-pool \
  --cluster production \
  --region asia-southeast1 \         # regional cluster = node di 3 AZ
  --num-nodes 2 \                    # 2 node per AZ = 6 node total
  --machine-type n2-standard-4

# EKS: managed node group multi-AZ
eksctl create nodegroup \
  --cluster production \
  --name workers \
  --nodes 6 \
  --node-zones ap-southeast-1a,ap-southeast-1b,ap-southeast-1c

Readiness dan Graceful Shutdown untuk HA #

HA bukan hanya tentang distribusi — Pod yang mati harus digantikan tanpa downtime:

spec:
  template:
    spec:
      # Graceful shutdown: beri waktu untuk selesaikan request yang berjalan
      terminationGracePeriodSeconds: 60
      containers:
      - name: api
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]
              # Jeda sebelum SIGTERM untuk beri waktu iptables rules ter-update

        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          failureThreshold: 3
          periodSeconds: 5
          # Pod hanya menerima traffic setelah ready, dihapus dari Endpoints saat tidak ready

        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          failureThreshold: 3
          periodSeconds: 15

Checklist HA Production #

Distribusi Pod:
  □ Anti-affinity atau topologySpreadConstraints dikonfigurasi
  □ Minimal 3 replicas (2 tidak cukup saat rolling update)
  □ Pod tersebar di minimal 2 availability zone

Proteksi terhadap gangguan:
  □ PodDisruptionBudget dikonfigurasi untuk service kritikal
  □ minAvailable atau maxUnavailable sesuai dengan keperluan

Pemulihan otomatis:
  □ readinessProbe dikonfigurasi untuk zero-downtime rolling update
  □ livenessProbe dikonfigurasi untuk deteksi dan restart Pod yang stuck
  □ startupProbe untuk aplikasi dengan startup lambat

Node:
  □ Node tersebar di minimal 2 availability zone
  □ System Pod (CoreDNS, kube-proxy) punya resource cukup
  □ Node tidak overcommitted (request total < kapasitas node × 80%)

Ringkasan #

Dua replicas di node yang sama bukan HA — gunakan pod anti-affinity dengan topologyKey: kubernetes.io/hostname untuk memastikan Pod tersebar ke node berbeda.
topologySpreadConstraints lebih presisi dari anti-affinity — maxSkew: 1 memastikan distribusi merata antar zone; lebih fleksibel karena bisa toleransi ketidakseimbangan kecil.
PDB wajib untuk service kritikal — tanpa PDB, node drain bisa mengambil semua Pod sekaligus; minAvailable: 2 memastikan Kubernetes tunggu Pod baru siap sebelum evict yang lama.
Minimal 3 replicas untuk production — dengan 2 replicas dan rolling update (maxUnavailable: 1), ada momen di mana hanya 1 Pod tersedia; 3 replicas lebih aman.
Multi-AZ node pool — node yang tersebar di 3 AZ berarti satu AZ down hanya kehilangan ~33% kapasitas; tidak tersebar berarti satu AZ down = seluruh cluster down.
preStop sleep 5 detik — jeda antara Pod dihapus dari Endpoints dan iptables rules ter-update di semua node; tanpa ini ada request yang datang ke Pod yang sudah mulai shutdown.

← Sebelumnya: Autoscaling Berikutnya: Cost Optimization →