Soffio

核心要点

Kubernetes是云原生时代的操作系统,通过声明式API和控制器模式抽象基础设施,实现应用的自动化部署、扩展和管理。

架构组件

  • 控制平面:API Server、etcd、Scheduler、Controller Manager
  • 节点组件:kubelet、kube-proxy、容器运行时
  • 核心资源:Pod(最小单元)、Deployment(应用管理)、Service(服务发现)、ConfigMap/Secret(配置管理)

关键设计模式

  1. 控制器模式:持续协调实际状态与期望状态的反馈循环
  2. 声明式API:描述想要什么而非如何做,系统自动实现
  3. 自愈能力:自动检测故障并恢复,无需人工干预
  4. 最终一致性:优先可用性,状态最终收敛

生产最佳实践

  • 设置资源requests/limits避免资源争抢
  • 配置健康检查(liveness/readiness/startup probe)
  • 使用安全上下文和NetworkPolicy加固安全
  • 通过HPA实现自动扩展
  • 使用StatefulSet管理有状态应用
  • Ingress提供外部访问和TLS终止

应用场景

微服务架构、CI/CD流水线、混合云部署、大数据处理、AI/ML训练、边缘计算。

Kubernetes的成功在于提供了正确的抽象层次,让开发者专注业务逻辑而非基础设施运维,真正实现了"基础设施即代码"的云原生理念。

容器编排的艺术:深入Kubernetes

Kubernetes architecture

引言:云原生的操作系统

Kubernetes(K8s)不仅仅是一个容器编排工具——它是云原生时代的操作系统。就像Linux抽象了硬件,Kubernetes抽象了基础设施,让开发者无需关心应用运行在哪台机器上,只需声明期望状态,K8s自动实现。

在容器化浪潮席卷整个行业之前,运维人员需要手动管理服务器、配置负载均衡器、设置监控告警。Docker简化了应用打包和部署,但当应用规模扩展到数十、数百个容器时,新的问题浮现:

  • 如何调度:哪个容器应该运行在哪台机器上?
  • 如何扩展:流量激增时如何自动增加实例?
  • 如何恢复:容器崩溃或节点故障时如何快速恢复?
  • 如何发现:服务如何找到彼此?
  • 如何更新:如何在不中断服务的情况下更新应用?

Kubernetes的答案是:声明式管理 + 控制器模式

Kubernetes核心架构

控制平面(Control Plane)

Kubernetes采用典型的主从架构,控制平面负责集群的决策和管理。

# 控制平面组件
Control Plane:
  kube-apiserver:
    作用: 所有组件的入口,提供RESTful API
    特点: 无状态,可水平扩展
    
  etcd:
    作用: 分布式键值存储,保存集群所有数据
    特点: 强一致性(Raft共识算法)
    数据: 包括配置、状态、元数据
    
  kube-scheduler:
    作用: 为新创建的Pod选择节点
    考虑因素:
      - 资源需求(CPU、内存)
      - 亲和性/反亲和性规则
      - 污点和容忍度
      - 数据本地性
      
  kube-controller-manager:
    作用: 运行各种控制器
    包含:
      - Node Controller: 监控节点健康
      - Replication Controller: 维护Pod副本数
      - Endpoints Controller: 填充Endpoints对象
      - Service Account & Token Controllers: 为命名空间创建默认账户
      
  cloud-controller-manager:
    作用: 与云提供商API交互
    功能: 节点管理、路由、负载均衡器、存储卷

节点组件(Node Components)

每个工作节点运行以下组件:

// kubelet - 节点代理
type Kubelet struct {
    // 向API Server注册节点
    // 监听分配到本节点的Pod
    // 确保容器运行在Pod中
}

// 核心职责
func (k *Kubelet) Run() {
    // 1. 同步Pod状态
    k.syncPods()
    
    // 2. 执行探针检查(liveness/readiness)
    k.probeManager.Start()
    
    // 3. 垃圾回收(删除未使用的容器和镜像)
    k.containerGC.Start()
    
    // 4. 上报节点状态
    k.nodeStatusUpdate()
}

// kube-proxy - 网络代理
type KubeProxy struct {
    // 实现Service的虚拟IP
    // 维护网络规则(iptables或IPVS)
}

func (p *KubeProxy) SyncServiceRules(service Service) {
    // 将Service的ClusterIP映射到后端Pods
    for _, endpoint := range service.Endpoints {
        p.addIPTablesRule(service.ClusterIP, endpoint.PodIP)
    }
}

// Container Runtime - 容器运行时
// 支持containerd、CRI-O、Docker(已弃用)

Kubernetes Architecture Diagram

核心概念详解

1. Pod:最小调度单元

Pod是Kubernetes中最小的可部署单元,包含一个或多个紧密耦合的容器。

# 简单Pod定义
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: nginx
    tier: frontend
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
    livenessProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 5
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5

Pod设计原则

# 何时将多个容器放入同一个Pod?
class PodDesignDecision:
    @staticmethod
    def should_colocate(container1, container2):
        """
        同一Pod的容器共享:
        1. 网络命名空间(localhost通信)
        2. IPC命名空间(进程间通信)
        3. 存储卷
        4. 生命周期
        """
        
        # 场景1: Sidecar模式
        if container2.role == "log-collector":
            # 日志收集器需要访问主容器的日志
            return True
        
        # 场景2: Ambassador模式
        if container2.role == "proxy":
            # 代理容器处理主容器的网络流量
            return True
        
        # 场景3: Adapter模式
        if container2.role == "metrics-adapter":
            # 适配器统一不同应用的指标格式
            return True
        
        # 否则,独立部署为不同的Pods
        return False

# 反例:不要放在同一Pod
# - 可独立扩展的服务(前端和后端)
# - 不需要紧密通信的服务

2. Deployment:应用管理

Deployment管理Pod的声明式更新,支持滚动更新和回滚。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # 最多超出期望副本数1个
      maxUnavailable: 0  # 最少可用副本数
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
        version: v2
    spec:
      containers:
      - name: web
        image: myapp:v2
        ports:
        - containerPort: 8080

滚动更新流程

// Deployment控制器的协调逻辑
class DeploymentController {
  async reconcile(deployment: Deployment) {
    const desiredReplicas = deployment.spec.replicas;
    const currentRS = this.getCurrentReplicaSet(deployment);
    const newRS = this.getNewReplicaSet(deployment);
    
    if (deployment.spec.template !== currentRS.template) {
      // 检测到更新
      await this.rolloutUpdate(deployment, currentRS, newRS);
    }
  }
  
  async rolloutUpdate(
    deployment: Deployment,
    oldRS: ReplicaSet,
    newRS: ReplicaSet
  ) {
    const maxSurge = deployment.spec.strategy.maxSurge;
    const maxUnavailable = deployment.spec.strategy.maxUnavailable;
    
    // 逐步增加新版本,减少旧版本
    while (newRS.replicas < deployment.spec.replicas) {
      // 1. 创建新Pod(不超过maxSurge)
      const newPods = Math.min(
        maxSurge,
        deployment.spec.replicas - newRS.replicas
      );
      await this.scaleUp(newRS, newPods);
      
      // 2. 等待新Pod Ready
      await this.waitForPodsReady(newRS);
      
      // 3. 删除旧Pod(不超过maxUnavailable)
      const oldPods = Math.min(
        maxUnavailable,
        oldRS.replicas
      );
      await this.scaleDown(oldRS, oldPods);
    }
    
    // 清理旧ReplicaSet(保留历史用于回滚)
    this.cleanupOldReplicaSets(deployment);
  }
}

3. Service:服务发现与负载均衡

Service为一组Pod提供稳定的网络端点。

apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  type: ClusterIP  # 或 NodePort, LoadBalancer
  selector:
    app: web
  ports:
  - protocol: TCP
    port: 80        # Service端口
    targetPort: 8080 # Pod端口
  sessionAffinity: ClientIP  # 会话亲和性

Service类型对比

// Service类型与使用场景
enum ServiceType {
    ClusterIP {
        // 默认类型,集群内部访问
        // 使用场景:内部微服务通信
        cluster_ip: String,
    },
    
    NodePort {
        // 在每个节点上开放端口
        // 使用场景:需要外部访问,但没有LoadBalancer
        node_port: u16,  // 30000-32767
        cluster_ip: String,
    },
    
    LoadBalancer {
        // 云提供商的负载均衡器
        // 使用场景:生产环境的外部访问
        external_ip: String,
        node_port: u16,
        cluster_ip: String,
    },
    
    ExternalName {
        // DNS CNAME记录
        // 使用场景:引用外部服务
        external_name: String,
    }
}

// kube-proxy实现Service的流量转发
impl KubeProxy {
    fn sync_service_rules(&self, service: &Service) {
        match self.proxy_mode {
            ProxyMode::IPTables => {
                // iptables模式:使用DNAT规则
                self.generate_iptables_rules(service);
            },
            ProxyMode::IPVS => {
                // IPVS模式:内核级负载均衡,性能更好
                self.configure_ipvs(service);
            }
        }
    }
}

Kubernetes Service Networking

4. ConfigMap与Secret:配置管理

# ConfigMap - 非敏感配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  database.url: "postgres://db:5432"
  cache.ttl: "3600"
  
---
# Secret - 敏感信息
apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
data:
  username: YWRtaW4=  # base64编码
  password: cGFzc3dvcmQ=

---
# 在Pod中使用
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    env:
    - name: DB_URL
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: database.url
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: db-credentials
          key: password
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
  volumes:
  - name: config-volume
    configMap:
      name: app-config

控制器模式:Kubernetes的核心

控制器是Kubernetes最重要的设计模式。每个控制器持续监控集群状态,并调整实际状态以匹配期望状态。

// 控制器的通用模式
type Controller struct {
    queue    workqueue.RateLimitingInterface
    informer cache.SharedIndexInformer
}

// 控制循环 - Kubernetes的心脏
func (c *Controller) Run(stopCh <-chan struct{}) {
    defer c.queue.ShutDown()
    
    // 启动informer
    go c.informer.Run(stopCh)
    
    // 等待缓存同步
    if !cache.WaitForCacheSync(stopCh, c.informer.HasSynced) {
        return
    }
    
    // 启动worker
    for i := 0; i < 5; i++ {
        go wait.Until(c.worker, time.Second, stopCh)
    }
    
<-stopCh
}

// 核心协调逻辑
func (c *Controller) worker() {
    for c.processNextItem() {
    }
}

func (c *Controller) processNextItem() bool {
    key, quit := c.queue.Get()
    if quit {
        return false
    }
    defer c.queue.Done(key)
    
    // 协调逻辑
    err := c.reconcile(key.(string))
    if err != nil {
        // 重试
        c.queue.AddRateLimited(key)
        return true
    }
    
    c.queue.Forget(key)
    return true
}

func (c *Controller) reconcile(key string) error {
    // 1. 获取期望状态(spec)
    desired := c.getDesiredState(key)
    
    // 2. 获取实际状态(status)
    actual := c.getActualState(key)
    
    // 3. 计算差异并采取行动
    if !reflect.DeepEqual(desired, actual) {
        return c.syncState(desired, actual)
    }
    
    return nil
}

自定义控制器示例

# 简化的自定义控制器(Python伪代码)
from kubernetes import client, watch

class DatabaseController:
    """管理自定义Database资源的控制器"""
    
    def __init__(self):
        self.api = client.CustomObjectsApi()
        
    def run(self):
        """控制循环"""
        watcher = watch.Watch()
        
        for event in watcher.stream(
            self.api.list_cluster_custom_object,
            group="mycompany.com",
            version="v1",
            plural="databases"
        ):
            self.handle_event(event)
    
    def handle_event(self, event):
        """处理事件"""
        db = event['object']
        event_type = event['type']
        
        if event_type == 'ADDED':
            self.create_database(db)
        elif event_type == 'MODIFIED':
            self.update_database(db)
        elif event_type == 'DELETED':
            self.delete_database(db)
    
    def create_database(self, db):
        """创建数据库实例"""
        # 1. 创建PVC(存储)
        pvc = self.create_pvc(db)
        
        # 2. 创建StatefulSet(数据库实例)
        statefulset = self.create_statefulset(db, pvc)
        
        # 3. 创建Service(网络访问)
        service = self.create_service(db)
        
        # 4. 更新Database的status
        self.update_status(db, {
            'phase': 'Running',
            'endpoint': service.cluster_ip
        })

Kubernetes Controller Pattern

网络模型

Kubernetes网络遵循几个基本原则:

  1. 所有Pod可以在无NAT的情况下相互通信
  2. 所有节点可以在无NAT的情况下与所有Pod通信
  3. Pod看到的自己的IP就是其他Pod看到它的IP
// CNI (Container Network Interface) 插件
// 常见实现:Calico, Flannel, Weave, Cilium

class KubernetesNetworking {
  // 网络层次
  layers = {
    podNetwork: {
      description: 'Pod内部容器通信',
      implementation: 'localhost(共享网络命名空间)',
      latency: '< 1μs'
    },
    
    podToPod: {
      description: '同节点Pod间通信',
      implementation: 'veth pair + bridge',
      latency: '~10μs'
    },
    
    podToPodCrossNode: {
      description: '跨节点Pod通信',
      implementation: 'Overlay network (VXLAN) 或 BGP routing',
      latency: '~100μs'
    },
    
    service: {
      description: 'Service抽象',
      implementation: 'kube-proxy (iptables/IPVS)',
      features: ['负载均衡', '服务发现']
    },
    
    ingress: {
      description: '外部访问',
      implementation: 'Ingress Controller (nginx, traefik)',
      features: ['HTTP路由', 'TLS终止', '虚拟主机']
    }
  };
}

// Ingress示例
const ingressExample = `
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
  tls:
  - hosts:
    - myapp.example.com
    secretName: tls-certificate
`;

存储管理

Kubernetes提供灵活的存储抽象,支持各种存储后端。

# PersistentVolume (PV) - 集群级存储资源
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: nfs-server.example.com
    path: /exports/data

---
# PersistentVolumeClaim (PVC) - 用户存储请求
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-claim
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: standard

---
# 在Pod中使用PVC
apiVersion: v1
kind: Pod
metadata:
  name: app-with-storage
spec:
  containers:
  - name: app
    image: myapp
    volumeMounts:
    - name: data
      mountPath: /data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: data-claim

StorageClass - 动态存储供应

# StorageClass定义
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

---
# PVC引用StorageClass(自动创建PV)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fast-storage
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 100Gi

调度与亲和性

Kubernetes调度器负责将Pod分配到合适的节点。

# 节点亲和性 - 将Pod调度到特定节点
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: gpu
            operator: In
            values:
            - nvidia-a100
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: zone
            operator: In
            values:
            - us-west-1a
  containers:
  - name: ml-training
    image: tensorflow:gpu
    resources:
      limits:
        nvidia.com/gpu: 1

---
# Pod亲和性 - 将相关Pod调度到一起
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cache
spec:
  replicas: 3
  template:
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web
            topologyKey: kubernetes.io/hostname
      containers:
      - name: redis
        image: redis:7

---
# Pod反亲和性 - 分散Pod到不同节点(高可用)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  template:
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - web
              topologyKey: kubernetes.io/hostname
      containers:
      - name: nginx
        image: nginx

污点与容忍度

# 为节点添加污点(阻止Pod调度)
kubectl taint nodes node1 dedicated=gpu:NoSchedule

# Pod容忍污点
apiVersion: v1
kind: Pod
metadata:
  name: tolerant-pod
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"
  containers:
  - name: app
    image: myapp

Kubernetes Scheduling

生产最佳实践

1. 资源管理

# 设置资源请求和限制
spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "256Mi"  # 调度时保证的资源
        cpu: "500m"      # 0.5核
      limits:
        memory: "512Mi"  # 最大可用内存(超出会OOM)
        cpu: "1000m"     # 最大可用CPU(throttling)

# QoS类别:
# - Guaranteed: requests == limits(最高优先级)
# - Burstable: requests < limits(中等)
# - BestEffort: 无requests/limits(最低,首先被驱逐)

2. 健康检查

spec:
  containers:
  - name: app
    livenessProbe:
      # 存活探针:失败则重启容器
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
      
    readinessProbe:
      # 就绪探针:失败则从Service移除
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      
    startupProbe:
      # 启动探针:慢启动应用的保护
      httpGet:
        path: /healthz
        port: 8080
      failureThreshold: 30
      periodSeconds: 10

3. 安全性

# Pod安全上下文
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
        add:
        - NET_BIND_SERVICE
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

---
# NetworkPolicy - 网络隔离
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-policy
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432

4. 可观测性

# 日志收集 - Sidecar模式
apiVersion: v1
kind: Pod
metadata:
  name: app-with-logging
spec:
  containers:
  - name: app
    image: myapp
    volumeMounts:
    - name: logs
      mountPath: /var/log/app
      
  - name: log-collector
    image: fluent-bit
    volumeMounts:
    - name: logs
      mountPath: /var/log/app
      readOnly: true
      
  volumes:
  - name: logs
    emptyDir: {}

---
# Prometheus监控
apiVersion: v1
kind: Service
metadata:
  name: app-metrics
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"
spec:
  selector:
    app: myapp
  ports:
  - port: 9090

高级特性

StatefulSet - 有状态应用

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

# StatefulSet特性:
# 1. 稳定的网络标识(mysql-0, mysql-1, mysql-2)
# 2. 稳定的持久存储
# 3. 有序部署和扩展
# 4. 有序删除和终止

DaemonSet - 每节点一个Pod

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostNetwork: true  # 使用主机网络
      hostPID: true      # 查看主机进程
      containers:
      - name: node-exporter
        image: prom/node-exporter
        ports:
        - containerPort: 9100

# 使用场景:
# - 节点监控(node-exporter)
# - 日志收集(fluentd)
# - 网络插件(CNI)
# - 存储插件(CSI)

Job与CronJob - 批处理

# Job - 一次性任务
apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  completions: 1
  parallelism: 3  # 并行运行3个Pod
  backoffLimit: 4  # 失败重试次数
  template:
    spec:
      containers:
      - name: migrator
        image: data-migrator
      restartPolicy: Never

---
# CronJob - 定时任务
apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup
spec:
  schedule: "0 2 * * *"  # 每天凌晨2点
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: backup-tool
            command:
            - /bin/sh
            - -c
            - backup.sh
          restartPolicy: OnFailure

真实场景:微服务部署

让我们部署一个完整的微服务应用。

# 命名空间隔离
apiVersion: v1
kind: Namespace
metadata:
  name: ecommerce

---
# PostgreSQL数据库
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: ecommerce
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: password
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi

---
# API服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: ecommerce
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: ecommerce-api:v1.2.3
        env:
        - name: DB_HOST
          value: postgres.ecommerce.svc.cluster.local
        - name: REDIS_HOST
          value: redis.ecommerce.svc.cluster.local
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"

---
# Redis缓存
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: ecommerce
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379

---
# 前端
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: ecommerce
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: nginx
        image: ecommerce-frontend:v1.2.3
        ports:
        - containerPort: 80

---
# Ingress路由
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ecommerce-ingress
  namespace: ecommerce
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - shop.example.com
    secretName: tls-secret
  rules:
  - host: shop.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api
            port:
              number: 8080
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80

---
# HorizontalPodAutoscaler - 自动扩展
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: ecommerce
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Kubernetes Microservices

Kubernetes的哲学

Kubernetes的设计体现了几个深刻的软件工程原则:

1. 声明式 vs 命令式

# 命令式(传统运维)
def deploy_v1():
    """告诉系统如何做"""
    ssh_to_server("server1")
    stop_service("myapp")
    upload_binary("myapp-v2")
    start_service("myapp")
    
    # 问题:
    # - 如果某步失败怎么办?
    # - 如何确保一致性?
    # - 如何处理并发变更?

# 声明式(Kubernetes)
def deploy_v2():
    """告诉系统你想要什么"""
    desired_state = {
        "apiVersion": "apps/v1",
        "kind": "Deployment",
        "spec": {
            "replicas": 3,
            "template": {
                "spec": {
                    "containers": [{
                        "image": "myapp:v2"
                    }]
                }
            }
        }
    }
    
    kubectl_apply(desired_state)
    
    # Kubernetes自动:
    # - 计算当前状态与期望状态的差异
    # - 执行必要的操作达到期望状态
    # - 持续监控并自动修正偏差
    # - 处理失败和重试

2. 控制器模式的普适性

控制器模式不仅适用于Kubernetes,它是管理复杂系统的通用方法。

// 控制器模式的本质:反馈循环
interface ControlLoop<T> {
  // 1. 观察当前状态
  observe(): T;
  
  // 2. 与期望状态比较
  diff(desired: T, actual: T): Difference;
  
  // 3. 采取纠正行动
  act(diff: Difference): void;
}

// 应用示例:
// - 温控器:维持室温
// - 巡航控制:维持车速
// - Kubernetes:维持集群状态
// - 经济政策:维持通胀率

// 共同特征:
// - 负反馈循环
// - 自动纠正偏差
// - 鲁棒性(resilience)

3. 可扩展性设计

Kubernetes通过CRD(Custom Resource Definition)允许用户扩展API。

# 自定义资源定义
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.mycompany.com
spec:
  group: mycompany.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              version:
                type: string
              storage:
                type: string
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database

---
# 使用自定义资源
apiVersion: mycompany.com/v1
kind: Database
metadata:
  name: my-postgres
spec:
  version: "15"
  storage: "100Gi"

4. 最终一致性

Kubernetes采用最终一致性模型,这是分布式系统的务实选择。

// 为什么是最终一致性?
type ConsistencyModel string

const (
    StrongConsistency ConsistencyModel = "strong"
    EventualConsistency ConsistencyModel = "eventual"
)

// Kubernetes选择最终一致性的原因:
reasons := []string{
    "可用性优先:即使部分组件故障,系统仍可工作",
    "性能:无需每次操作都等待所有节点确认",
    "扩展性:控制平面和节点可以独立扩展",
    "网络分区容忍:遵循CAP定理,P不可避免,选择A而非C",
}

// 实践中的体现:
// - Pod状态更新可能延迟几秒
// - Service Endpoints更新不是瞬时的
// - 但最终会收敛到正确状态

结论:抽象的力量

Kubernetes的成功在于它提供了正确的抽象层次。开发者不需要关心:

  • 哪台机器:调度器自动选择
  • 如何扩展:HPA自动伸缩
  • 如何恢复:控制器自动修复
  • 如何发现:Service提供稳定端点
  • 如何更新:Deployment管理滚动更新

这些抽象释放了开发者的生产力,让他们专注于业务逻辑而非基础设施。

Kubernetes不是银弹。对于小型应用,它可能过于复杂。但对于需要高可用、弹性、可观测的云原生应用,Kubernetes提供了久经考验的平台。

云原生的本质:不是技术栈的选择,而是设计理念的转变——从面向机器到面向应用,从命令式到声明式,从手动到自动,从静态到动态。Kubernetes是这一理念的最佳实践。


"Kubernetes是对云计算承诺的兑现:让基础设施成为软件。它证明了,通过正确的抽象和控制循环,我们可以将混乱的分布式系统转化为可预测、可管理的平台。这不仅是技术的胜利,更是设计哲学的胜利。"