作为CNCF成员,Weave Flagger提供了持续集成和持续交付的各项能力。Flagger将渐进式发布总结为3类:
灰度发布/金丝雀发布(Canary):用于渐进式切流到灰度版本(progressive traffic shifting)
A/B测试(A/B Testing):用于根据请求信息将用户请求路由到A/B版本(HTTP headers and cookies traffic routing)
蓝绿发布(Blue/Green):用于流量切换和流量复制 (traffic switching and mirroring)
本篇将介绍Flagger on ASM的渐进式灰度发布实践。
1 部署Flagger
alias k="kubectl --kubeconfig $USER_CONFIG" alias h="helm --kubeconfig $USER_CONFIG" cp $MESH_CONFIG kubeconfig k -n istio-system create secret generic istio-kubeconfig --from-file kubeconfig k -n istio-system label secret istio-kubeconfig istio/multiCluster=true h repo add flagger https://flagger.app h repo update k apply -f $FLAAGER_SRC/artifacts/flagger/crd.yaml h upgrade -i flagger flagger/flagger --namespace=istio-system \ --set crd.create=false \ --set meshProvider=istio \ --set metricsServer=http://prometheus:9090 \ --set istio.kubeconfig.secretName=istio-kubeconfig \ --set istio.kubeconfig.key=kubeconfig
2 部署Gateway
apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: public-gateway namespace: istio-system spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: HTTP hosts: - "*"
kubectl --kubeconfig "$MESH_CONFIG" apply -f resources_canary/public-gateway.yaml
3 部署flagger-loadtester
kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"
4 部署PodInfo及其HPA
kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main"
1 部署Canary
Canary是基于Flagger进行灰度发布的核心CRD,详见How it works。我们首先部署如下Canary配置文件podinfo-canary.yaml,完成完整的渐进式灰度流程,然后在此基础上引入应用维度的监控指标,来进一步实现应用有感知的渐进式灰度发布。
apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: podinfo namespace: test spec: # deployment reference targetRef: apiVersion: apps/v1 kind: Deployment name: podinfo # the maximum time in seconds for the canary deployment # to make progress before it is rollback (default 600s) progressDeadlineSeconds: 60 # HPA reference (optional) autoscalerRef: apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler name: podinfo service: # service port number port: 9898 # container port number or name (optional) targetPort: 9898 # Istio gateways (optional) gateways: - public-gateway.istio-system.svc.cluster.local # Istio virtual service host names (optional) hosts: - '*' # Istio traffic policy (optional) trafficPolicy: tls: # use ISTIO_MUTUAL when mTLS is enabled mode: DISABLE # Istio retry policy (optional) retries: attempts: 3 perTryTimeout: 1s retryOn: "gateway-error,connect-failure,refused-stream" analysis: # schedule interval (default 60s) interval: 1m # max number of failed metric checks before rollback threshold: 5 # max traffic percentage routed to canary # percentage (0-100) maxWeight: 50 # canary increment step # percentage (0-100) stepWeight: 10 metrics: - name: request-success-rate # minimum req success rate (non 5xx responses) # percentage (0-100) thresholdRange: min: 99 interval: 1m - name: request-duration # maximum req duration P99 # milliseconds thresholdRange: max: 500 interval: 30s # testing (optional) webhooks: - name: acceptance-test type: pre-rollout url: http://flagger-loadtester.test/ timeout: 30s metadata: type: bash cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token" - name: load-test url: http://flagger-loadtester.test/ timeout: 5s metadata: cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"
kubectl --kubeconfig "$USER_CONFIG" apply -f resources_canary/podinfo-canary.yaml
2 升级podinfo
kubectl --kubeconfig "$USER_CONFIG" -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1
3 渐进式灰度发布
while true; do kubectl --kubeconfig "$USER_CONFIG" -n test describe canary/podinfo; sleep 10s;done
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Synced 39m flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation Normal Synced 38m (x2 over 39m) flagger all the metrics providers are available! Normal Synced 38m flagger Initialization done! podinfo.test Normal Synced 37m flagger New revision detected! Scaling up podinfo.test Normal Synced 36m flagger Starting canary analysis for podinfo.test Normal Synced 36m flagger Pre-rollout check acceptance-test passed Normal Synced 36m flagger Advance podinfo.test canary weight 10 Normal Synced 35m flagger Advance podinfo.test canary weight 20 Normal Synced 34m flagger Advance podinfo.test canary weight 30 Normal Synced 33m flagger Advance podinfo.test canary weight 40 Normal Synced 29m (x4 over 32m) flagger (combined from similar events): Promotion completed! Scaling down podinfo.test
autoscalerRef: apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler name: podinfo
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: podinfo spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 4 metrics: - type: Resource resource: name: cpu target: type: Utilization # scale up if usage is above # 99% of the requested CPU (100m) averageUtilization: 99
1 感知应用QPS的HPA
kubectl --kubeconfig "$USER_CONFIG" apply -f resources_hpa/requests_total_hpa.yaml
autoscalerRef: apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler name: podinfo-total
2 升级podinfo
kubectl --kubeconfig "$USER_CONFIG" -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1
3 验证渐进式灰度发布及HPA
while true; do k -n test describe canary/podinfo; sleep 10s;done
在渐进式灰度发布过程中(在出现Advance podinfo.test canary weight 10信息后,见下图),我们使用如下命令,从入口网关发起请求以增加QPS:
INGRESS_GATEWAY=$(kubectl --kubeconfig $USER_CONFIG -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}') hey -z 20m -c 2 -q 10 http://$INGRESS_GATEWAY
watch kubectl --kubeconfig $USER_CONFIG get canaries --all-namespaces
watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
analysis: metrics: - name: request-success-rate # minimum req success rate (non 5xx responses) # percentage (0-100) thresholdRange: min: 99 interval: 1m - name: request-duration # maximum req duration P99 # milliseconds thresholdRange: max: 500 interval: 30s # testing (optional)
1 Flagger内置监控指标
到目前为止,Canary中使用的metrics配置一直是Flagger的两个内置监控指标:请求成功率(request-success-rate)和请求延迟(request-duration)。如下图所示,Flagger中不同平台对内置监控指标的定义,其中,istio使用的是本系列第一篇介绍的Mixerless Telemetry相关的遥测数据。
2 自定义监控指标
apiVersion: flagger.app/v1beta1 kind: MetricTemplate metadata: name: not-found-percentage namespace: istio-system spec: provider: type: prometheus address: http://prometheus.istio-system:9090 query: | 100 - sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}", response_code!="404" }[{{ interval }}] ) ) / sum( rate( istio_requests_total{ reporter="destination", destination_workload_namespace="{{ namespace }}", destination_workload="{{ target }}" }[{{ interval }}] ) ) * 100
k apply -f resources_canary2/metrics-404.yaml
analysis: metrics: - name: "404s percentage" templateRef: name: not-found-percentage namespace: istio-system thresholdRange: max: 5 interval: 1m
3 最后的验证
#!/usr/bin/env sh SCRIPT_PATH="$( cd "$(dirname "$0")" >/dev/null 2>&1 pwd -P )/" cd "$SCRIPT_PATH" || exit source config alias k="kubectl --kubeconfig $USER_CONFIG" alias m="kubectl --kubeconfig $MESH_CONFIG" alias h="helm --kubeconfig $USER_CONFIG" echo "#### I Bootstrap ####" echo "1 Create a test namespace with Istio sidecar injection enabled:" k delete ns test m delete ns test k create ns test m create ns test m label namespace test istio-injection=enabled echo "2 Create a deployment and a horizontal pod autoscaler:" k apply -f $FLAAGER_SRC/kustomize/podinfo/deployment.yaml -n test k apply -f resources_hpa/requests_total_hpa.yaml k get hpa -n test echo "3 Deploy the load testing service to generate traffic during the canary analysis:" k apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main" k get pod,svc -n test echo "......" sleep 40s echo "4 Create a canary custom resource:" k apply -f resources_canary2/metrics-404.yaml k apply -f resources_canary2/podinfo-canary.yaml k get pod,svc -n test echo "......" sleep 120s echo "#### III Automated canary promotion ####" echo "1 Trigger a canary deployment by updating the container image:" k -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1 echo "2 Flagger detects that the deployment revision changed and starts a new rollout:" while true; do k -n test describe canary/podinfo; sleep 10s;done
sh progressive_delivery/advanced_canary.sh
#### I Bootstrap #### 1 Create a test namespace with Istio sidecar injection enabled: namespace "test" deleted namespace "test" deleted namespace/test created namespace/test created namespace/test labeled 2 Create a deployment and a horizontal pod autoscaler: deployment.apps/podinfo created horizontalpodautoscaler.autoscaling/podinfo-total created NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo-total Deployment/podinfo/10 (avg) 1 5 0 0s 3 Deploy the load testing service to generate traffic during the canary analysis: service/flagger-loadtester created deployment.apps/flagger-loadtester created NAME READY STATUS RESTARTS AGE pod/flagger-loadtester-76798b5f4c-ftlbn 0/2 Init:0/1 0 1s pod/podinfo-689f645b78-65n9d 1/1 Running 0 28s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/flagger-loadtester ClusterIP 80/TCP 1s ...... 4 Create a canary custom resource: metrictemplate.flagger.app/not-found-percentage created canary.flagger.app/podinfo created NAME READY STATUS RESTARTS AGE pod/flagger-loadtester-76798b5f4c-ftlbn 2/2 Running 0 41s pod/podinfo-689f645b78-65n9d 1/1 Running 0 68s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/flagger-loadtester ClusterIP 80/TCP 41s ...... #### III Automated canary promotion #### 1 Trigger a canary deployment by updating the container image: deployment.apps/podinfo image updated 2 Flagger detects that the deployment revision changed and starts a new rollout: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Synced 10m flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation Normal Synced 9m23s (x2 over 10m) flagger all the metrics providers are available! Normal Synced 9m23s flagger Initialization done! podinfo.test Normal Synced 8m23s flagger New revision detected! Scaling up podinfo.test Normal Synced 7m23s flagger Starting canary analysis for podinfo.test Normal Synced 7m23s flagger Pre-rollout check acceptance-test passed Normal Synced 7m23s flagger Advance podinfo.test canary weight 10 Normal Synced 6m23s flagger Advance podinfo.test canary weight 20 Normal Synced 5m23s flagger Advance podinfo.test canary weight 30 Normal Synced 4m23s flagger Advance podinfo.test canary weight 40 Normal Synced 23s (x4 over 3m23s) flagger (combined from similar events): Promo
