[EKS] Autoscaling

Infra & DevOps/k8s(EKS)

[EKS] Autoscaling

용감한 개복치 2024. 4. 6. 21:10

Amazon EKS Workshop Study 2기

5주차

Autoscaling

이번 시간에 배울 것은 eks의 오토스케일링. EKS 환경에서 여러가지 방식으로 오토스케일을 걸어보는 실습

ecs 오토스케일링에 대해서는 다뤄봤는데, 쿠버네티스의 오토스케일링은 크게 다를 것이 있을까?

k8s 의 오토스케일링

- 쿠버네티스 클러스터 오토스케일링은 쿠버네티스에서 기본적으로 제공하는 기능

- Cluster AutoScaler는 Pod의 리소스 요청에 따라 클러스터의 노드를 추가하거나 제거

- 클러스터 내의 노드(VM 또는 물리 서버)를 자동으로 확장하거나 축소하여 애플리케이션의 수요에 맞춰 자원을 동적으로 조정

- 주로 Horizontal Pod Autoscaler(HPA)와 함께 사용

출철 : https://swalloow.github.io/eks-autoscale/

EKS 의 오토스케일링

- AWS에서 제공하는 관리형 쿠버네티스 서비스 -> AWS 인프라를 사용

- AWS의 Auto Scaling Group을 사용

- ASG는 주기적으로 현재 상태를 확인하고 Desired State로 변화하는 방식으로 동작

- ASG의 기능인 Auto Scaling Policy 적용 가능

- Policy로 메모리 지표를 사용하고 싶다면 각 노드에 CloudWatch Agent를 배포해야함

실습 환경 인프라 구축

클라우드 포메이션으로 기본 환경 구성

https://ap-northeast-2.console.aws.amazon.com/cloudformation/home?region=ap-northeast-2#/stacks/create?stackName=myeks&templateURL=https:%2F%2Fs3.ap-northeast-2.amazonaws.com%2Fcloudformation.cloudneta.net%2FK8S%2Feks-oneclick4.yaml

결과

배스쳔 1, 노드 그룹 3, 서브넷 3, vpc 1,

실습 환경 기본 설정

배스쳔 SSH 접속 후, default 네임 스페이스 지정

kubectl ns default

external dns 지정

MyDomain=peachengineer.click
echo "export MyDomain=peachengineer.click" >> /etc/profile

MyDnzHostedZoneId=$(aws route53 list-hosted-zones-by-name --dns-name "${MyDomain}." --query "HostedZones[0].Id" --output text)
echo $MyDomain, $MyDnzHostedZoneId
curl -s -O https://raw.githubusercontent.com/gasida/PKOS/main/aews/externaldns.yaml
MyDomain=$MyDomain MyDnzHostedZoneId=$MyDnzHostedZoneId envsubst < externaldns.yaml | kubectl apply -f -

kube-ops-view 설치

helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set env.TZ="Asia/Seoul" --namespace kube-system
kubectl patch svc -n kube-system kube-ops-view -p '{"spec":{"type":"LoadBalancer"}}'
kubectl annotate service kube-ops-view -n kube-system "external-dns.alpha.kubernetes.io/hostname=kubeopsview.$MyDomain"
echo -e "Kube Ops View URL = http://kubeopsview.$MyDomain:8080/#scale=1.5"

lb controller 설치

helm repo add eks https://aws.github.io/eks-charts
helm repo update
helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=$CLUSTER_NAME \
  --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller

gp3 스토리지 클래스 생성

kubectl get sc
kubectl apply -f https://raw.githubusercontent.com/gasida/PKOS/main/aews/gp3-sc.yaml
kubectl get sc

노드 보안그룹 확인

NGSGID=$(aws ec2 describe-security-groups --filters Name=group-name,Values=*ng1* --query "SecurityGroups[*].[GroupId]" --output text)
aws ec2 authorize-security-group-ingress --group-id $NGSGID --protocol '-1' --cidr 192.168.1.100/32

프로메테우스 & 그라파나 설치

# 사용 리전의 인증서 ARN 확인
# arn:aws:acm:ap-northeast-2:349873998748:certificate/a0989470-d470-4a8a-be9d-3e3b6fad8be2
# CERT_ARN=arn:aws:acm:ap-northeast-2:349873998748:certificate/a0989470-d470-4a8a-be9d-3e3b6fad8be2
CERT_ARN=`aws acm list-certificates --query 'CertificateSummaryList[].CertificateArn[]' --output text`
echo $CERT_ARN

# repo 추가
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# 파라미터 파일 생성 : PV/PVC(AWS EBS) 삭제에 불편하니, 4주차 실습과 다르게 PV/PVC 미사용
cat <<EOT > monitor-values.yaml
prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    retention: 5d
    retentionSize: "10GiB"

  verticalPodAutoscaler:
    enabled: true

  ingress:
    enabled: true
    ingressClassName: alb
    hosts: 
      - prometheus.$MyDomain
    paths: 
      - /*
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
      alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
      alb.ingress.kubernetes.io/success-codes: 200-399
      alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
      alb.ingress.kubernetes.io/group.name: study
      alb.ingress.kubernetes.io/ssl-redirect: '443'

grafana:
  defaultDashboardsTimezone: Asia/Seoul
  adminPassword: prom-operator
  defaultDashboardsEnabled: false

  ingress:
    enabled: true
    ingressClassName: alb
    hosts: 
      - grafana.$MyDomain
    paths: 
      - /*
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
      alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
      alb.ingress.kubernetes.io/success-codes: 200-399
      alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
      alb.ingress.kubernetes.io/group.name: study
      alb.ingress.kubernetes.io/ssl-redirect: '443'

kube-state-metrics:
  rbac:
    extraRules:
      - apiGroups: ["autoscaling.k8s.io"]
        resources: ["verticalpodautoscalers"]
        verbs: ["list", "watch"]
  prometheus:
    monitor:
      enabled: true
  customResourceState:
    enabled: true
    config:
      kind: CustomResourceStateMetrics
      spec:
        resources:
          - groupVersionKind:
              group: autoscaling.k8s.io
              kind: "VerticalPodAutoscaler"
              version: "v1"
            labelsFromPath:
              verticalpodautoscaler: [metadata, name]
              namespace: [metadata, namespace]
              target_api_version: [apiVersion]
              target_kind: [spec, targetRef, kind]
              target_name: [spec, targetRef, name]
            metrics:
              - name: "vpa_containerrecommendations_target"
                help: "VPA container recommendations for memory."
                each:
                  type: Gauge
                  gauge:
                    path: [status, recommendation, containerRecommendations]
                    valueFrom: [target, memory]
                    labelsFromPath:
                      container: [containerName]
                commonLabels:
                  resource: "memory"
                  unit: "byte"
              - name: "vpa_containerrecommendations_target"
                help: "VPA container recommendations for cpu."
                each:
                  type: Gauge
                  gauge:
                    path: [status, recommendation, containerRecommendations]
                    valueFrom: [target, cpu]
                    labelsFromPath:
                      container: [containerName]
                commonLabels:
                  resource: "cpu"
                  unit: "core"
  selfMonitor:
    enabled: true

alertmanager:
  enabled: false
EOT
cat monitor-values.yaml | yh

# 배포
kubectl create ns monitoring
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 57.2.0 \
--set prometheus.prometheusSpec.scrapeInterval='15s' --set prometheus.prometheusSpec.evaluationInterval='15s' \
-f monitor-values.yaml --namespace monitoring

# Metrics-server 배포
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# 프로메테우스 ingress 도메인으로 웹 접속
echo -e "Prometheus Web URL = https://prometheus.$MyDomain"

# 그라파나 웹 접속 : 기본 계정 - admin / prom-operator
echo -e "Grafana Web URL = https://grafana.$MyDomain"

3주차에서 설정한 방법대로 15757 넘버로 대시보드 구성

eks node viewer 설치 : 노드 할당 가능 용량, 요청된 리퀘스트 리소스 표시( 실제 파드 리소스 사용량 x )

# go 설치
wget https://go.dev/dl/go1.22.1.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.22.1.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
go version
go version go1.22.1 linux/amd64

# EKS Node Viewer 설치 : 약 2분 이상 소요
go install github.com/awslabs/eks-node-viewer/cmd/eks-node-viewer@latest

# [신규 터미널] EKS Node Viewer 접속
cd ~/go/bin && ./eks-node-viewer
혹은
cd ~/go/bin && ./eks-node-viewer --resources cpu,memory

명령 샘플
# Standard usage
./eks-node-viewer

# Display both CPU and Memory Usage
./eks-node-viewer --resources cpu,memory

# Karenter nodes only
./eks-node-viewer --node-selector "karpenter.sh/provisioner-name"

# Display extra labels, i.e. AZ
./eks-node-viewer --extra-labels topology.kubernetes.io/zone

# Specify a particular AWS profile and region
AWS_PROFILE=myprofile AWS_REGION=us-west-2

기본 옵션
# select only Karpenter managed nodes
node-selector=karpenter.sh/provisioner-name

# display both CPU and memory
resources=cpu,memory

node viewer

HPA

이론

- Horizontal Pod Autoscaler

- 쿠버네티스에서 애플리케이션의 수평적인 스케일링을 자동화하는 기능

- 일반적으로 웹 서버나 API 서버와 같이 여러 개의 인스턴스가 필요한 애플리케이션을 운영할 때, 사용자 트래픽이나 작업 부하에 따라 인스턴스 수를 동적으로 조절하는 것이 유용, HPA는 이러한 작업을 자동화하여 사용자가 직접 인스턴스 수를 변경하지 않고도 쿠버네티스 클러스터에서 애플리케이션의 인스턴스 수를 조정할 수 있게함.

출처 : https://kubernetes.io/ko/docs/tasks/run-application/horizontal-pod-autoscale/

- 기능

지표(Metrics) 기반 오토스케일링: HPA는 지정된 지표(예: CPU 사용률, 메모리 사용률 등)를 기반으로 애플리케이션의 현재 부하 상태를 모니터링
목표 지표 설정: 사용자는 원하는 지표 값을 설정하여 해당 값에 맞춰 애플리케이션 인스턴스의 수를 조절
자동 스케일링: HPA는 모니터링된 지표에 따라 애플리케이션 인스턴스의 수를 자동으로 조정. EX) CPU 사용률이 설정된 임계값을 넘으면 인스턴스 수를 증가시키고, 반대로 너무 낮으면 인스턴스 수를 감소
사용자 정의 가능성: HPA는 사용자가 원하는 대로 구성. 스케일링 동작을 조정 or 다양한 지표를 사용하여 스케일링을 수행

실습

대시보드에 JSON 으로 import(json 은 미리 제공된 파)

php-apache 서버 구동

# Run and expose php-apache server
curl -s -O https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/application/php-apache.yaml
cat php-apache.yaml | yh
kubectl apply -f php-apache.yaml

# 확인
kubectl exec -it deploy/php-apache -- cat /var/www/html/index.php
...

# 모니터링 : 터미널2개 사용
watch -d 'kubectl get hpa,pod;echo;kubectl top pod;echo;kubectl top node'
kubectl exec -it deploy/php-apache -- top

# 접속
PODIP=$(kubectl get pod -l run=php-apache -o jsonpath={.items[0].status.podIP})
curl -s $PODIP; echo

접속 확인

HPA 생성

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
kubectl describe hpa

확인

kubectl get hpa php-apache -o yaml | kubectl neat | yh

반복 접속 1 (파드1 IP로 접속) >> 증가 확인 후 중지

while true;do curl -s $PODIP; sleep 0.5; done

모니터링 확인

그라파나에서 레플리카가 늘어나는 것 시각화로 확인 가능

KEDA

이론

- Kubernetes Event-Driven Autoscaling

- 이벤트 기반 자동 스케일링을 제공

- 주로 서버리스 및 이벤트 기반 애플리케이션을 위한 스케일링을 위해 개발

- HPA 와의 차이점

스케일링 기준:
- HPA: 주로 리소스 사용량(CPU, 메모리 등)과 같은 지표에 기반
- KEDA: 이벤트 소스에 기반. 예를 들어, Kafka나 Azure Queue Storage와 같은 메시지 큐나 이벤트 스트림에 대한 이벤트를 감지하여 애플리케이션을 스케일링
적용 범위:
- HPA: 일반적으로 서버나 웹 애플리케이션과 같은 Stateful 서비스에 적용됩니다.
- KEDA: 주로 서버리스 및 이벤트 기반 아키텍처(예: 이벤트 소싱, 이벤트 처리)에 적용
스케일링 제어 방식:
- HPA: 설정된 임계값에 따라 스케일링을 수행하며, 주로 수평적인 스케일링
- KEDA: 이벤트를 처리하는 데 필요한 인스턴스 수를 동적으로 조절하여 수직 및 수평적인 스케일링을 수행. 예를 들어, 메시지 큐에 메시지가 쌓일수록 인스턴스 수를 확장 가능

실습

그라파나 대시보드 설정(미리 제공받은 JSON 템플릿)

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "description": "Visualize metrics provided by KEDA",
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 1653,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 8,
      "panels": [],
      "title": "Metric Server",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The total number of errors encountered for all scalers.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "scaledObject"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "keda-system/keda-operator-metrics-apiserver"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 0,
        "y": 1
      },
      "id": 4,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(job) (rate(keda_scaler_errors{}[5m]))",
          "legendFormat": "{{ job }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Total Errors",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The number of errors that have occurred for each scaler.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "scaler"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "prometheusScaler"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 8,
        "y": 1
      },
      "id": 3,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(scaler) (rate(keda_scaler_errors{exported_namespace=~\"$namespace\", scaledObject=~\"$scaledObject\", scaler=~\"$scaler\"}[5m]))",
          "legendFormat": "{{ scaler }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Errors",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The number of errors that have occurred for each scaled object.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 16,
        "y": 1
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(scaledObject) (rate(keda_scaled_object_errors{exported_namespace=~\"$namespace\", scaledObject=~\"$scaledObject\"}[5m]))",
          "legendFormat": "{{ scaledObject }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaled Object Errors",
      "type": "timeseries"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 10,
      "panels": [],
      "title": "Scale Target",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The current value for each scaler’s metric that would be used by the HPA in computing the target average.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "none"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "blue",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 24,
        "x": 0,
        "y": 11
      },
      "id": 5,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(metric) (keda_scaler_metrics_value{exported_namespace=~\"$namespace\", metric=~\"$metric\", scaledObject=\"$scaledObject\"})",
          "legendFormat": "{{ metric }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Metric Value",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones based on time difference",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 21,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineStyle": {
              "fill": "solid"
            },
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 20
      },
      "id": 13,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "format": "time_series",
          "instant": false,
          "interval": "",
          "legendFormat": "current_replicas",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "format": "time_series",
          "hide": false,
          "instant": false,
          "legendFormat": "max_replicas",
          "range": true,
          "refId": "B"
        }
      ],
      "title": "Current/max replicas (time based)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones based on time difference",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "continuous-GrYlRd"
          },
          "custom": {
            "fillOpacity": 70,
            "lineWidth": 0,
            "spanNulls": false
          },
          "mappings": [
            {
              "options": {
                "0": {
                  "color": "green",
                  "index": 0,
                  "text": "No scaling"
                }
              },
              "type": "value"
            },
            {
              "options": {
                "from": -200,
                "result": {
                  "color": "light-red",
                  "index": 1,
                  "text": "Scaling down"
                },
                "to": 0
              },
              "type": "range"
            },
            {
              "options": {
                "from": 0,
                "result": {
                  "color": "semi-dark-red",
                  "index": 2,
                  "text": "Scaling up"
                },
                "to": 200
              },
              "type": "range"
            }
          ],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "none"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 28
      },
      "id": 16,
      "options": {
        "alignValue": "left",
        "legend": {
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": false,
          "width": 0
        },
        "mergeValues": true,
        "rowHeight": 1,
        "showValue": "never",
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "delta(kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}[1m])",
          "format": "time_series",
          "instant": false,
          "interval": "",
          "legendFormat": ".",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Changes in replicas",
      "type": "state-timeline"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "min": 0,
          "thresholds": {
            "mode": "percentage",
            "steps": [
              {
                "color": "green"
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 36
      },
      "id": 15,
      "options": {
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "/^current_replicas$/",
          "values": false
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true
      },
      "pluginVersion": "9.5.2",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "instant": true,
          "legendFormat": "current_replicas",
          "range": false,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "hide": false,
          "instant": true,
          "legendFormat": "max_replicas",
          "range": false,
          "refId": "B"
        }
      ],
      "title": "Current/max replicas",
      "type": "gauge"
    }
  ],
  "refresh": "1m",
  "schemaVersion": 38,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "queryValue": "",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "current": {
          "selected": false,
          "text": "bhe-test",
          "value": "bhe-test"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active,exported_namespace)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "namespace",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active,exported_namespace)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaledObject)",
        "hide": 0,
        "includeAll": true,
        "multi": true,
        "name": "scaledObject",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaledObject)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "cronScaler",
          "value": "cronScaler"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaler)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "scaler",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaler)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "s0-cron-Etc-UTC-40xxxx-55xxxx",
          "value": "s0-cron-Etc-UTC-40xxxx-55xxxx"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},metric)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "metric",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},metric)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      }
    ]
  },
  "time": {
    "from": "now-24h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "KEDA",
  "uid": "asdasd8rvmMxdVk",
  "version": 8,
  "weekStart": ""
}

KEDA 설치

cat <<EOT > keda-values.yaml
metricsServer:
  useHostNetwork: true

prometheus:
  metricServer:
    enabled: true
    port: 9022
    portName: metrics
    path: /metrics
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true
  operator:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true

  webhooks:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus webhooks
      enabled: true
EOT

kubectl create namespace keda
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --version 2.13.0 --namespace keda -f keda-values.yaml

설치 확인

kubectl get all -n keda
kubectl get validatingwebhookconfigurations keda-admission
kubectl get validatingwebhookconfigurations keda-admission | kubectl neat | yh
kubectl get crd | grep keda

네임스페이스에 디플로이먼트 생성

kubectl apply -f php-apache.yaml -n keda
kubectl get pod -n keda

크론잡 생성과 그라파나 대시보드 추가, 모니터링 확인

cat <<EOT > keda-cron.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: php-apache-cron-scaled
spec:
  minReplicaCount: 0
  maxReplicaCount: 2
  pollingInterval: 30
  cooldownPeriod: 300
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  triggers:
  - type: cron
    metadata:
      timezone: Asia/Seoul
      start: 00,15,30,45 * * * *
      end: 05,20,35,50 * * * *
      desiredReplicas: "1"
EOT
kubectl apply -f keda-cron.yaml -n keda

watch -d 'kubectl get ScaledObject,hpa,pod -n keda'
kubectl get ScaledObject -w

kubectl get ScaledObject,hpa,pod -n keda
kubectl get hpa -o jsonpath={.items[0].spec} -n keda | jq

VPA

이론

- 자료 : https://malwareanalysis.tistory.com/603

EKS 스터디 - 5주차 1편 - VPA

VPA란? VPA(Vertical Pod Autoscaler)는 pod resources.request을 최대한 최적값으로 수정합니다. 수정된 request값이 기존 값보다 위 또는 아래 범위에 속하므로 Vertical라고 표현합니다. pod마다 resource.request를 최

malwareanalysis.tistory.com

- pod resources.request을 최대한 최적값으로 수정, HPA와 같이 사용 불가능, 수정 시 파드 재실행

- 수직적 파드 오토스케일링 == 파드 개수는 그대로, 리소스 크기를 확장/축소 가능

출처 : https://velog.io/@rockwellvinca/EKS-HPAHorizontal-Pod-Autoscaler%EC%99%80-VPAVertical-Pod-Autoscaler

- 메트릭 서버 필요, 메트릭 서버는 파드로 부터 메트릭 수집

- 동작방식

메트릭 수집: VPA는 파드에 할당된 CPU 및 메모리와 같은 리소스 사용량을 주기적으로 모니터링 -> 메트릭 서버를 통해 파드의 리소스 사용량 데이터를 수집
분석 및 예측: 수집된 데이터를 분석하여 애플리케이션이 필요로 하는 최적의 리소스량을 파악 -> 파드 권장사용량 정의 -> VPA 에게 전달
리소스 할당 조정: VPA Updater 가 VPA가 전달받은 파드 권장 사용량 정보를 현재 파드와 비교하여 리소스를 조정
파드 재생성 프로세스: 수직 스케일업이 필요하다면 파드 종료 후 디플로이로 재생성
VPA Admission Controller 가 파드에게 파드 권장 사용량 정보 전달

실습

14588번으로 그라파나 대시보드 import

VPA 설치 및 실행 후 확인

# 코드 다운로드
git clone https://github.com/kubernetes/autoscaler.git
cd ~/autoscaler/vertical-pod-autoscaler/
tree hack

# openssl 버전 확인
openssl version
OpenSSL 1.0.2k-fips  26 Jan 2017

# openssl 1.1.1 이상 버전 확인
yum install openssl11 -y
openssl11 version
OpenSSL 1.1.1g FIPS  21 Apr 2020

# 스크립트파일내에 openssl11 수정
sed -i 's/openssl/openssl11/g' /root/go/bin/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/gencerts.sh

# Deploy the Vertical Pod Autoscaler to your cluster with the following command.
watch -d kubectl get pod -n kube-system
cat hack/vpa-up.sh
./hack/vpa-up.sh
kubectl get crd | grep autoscaling
kubectl get mutatingwebhookconfigurations vpa-webhook-config
kubectl get mutatingwebhookconfigurations vpa-webhook-config -o json | jq

공식 예제 : pod가 실행되면 약 2~3분 뒤에 pod resource.reqeust가 VPA에 의해 수정

# 모니터링
watch -d "kubectl top pod;echo "----------------------";kubectl describe pod | grep Requests: -A2"

# 공식 예제 배포
cd ~/autoscaler/vertical-pod-autoscaler/
cat examples/hamster.yaml | yh
kubectl apply -f examples/hamster.yaml && kubectl get vpa -w

# 파드 리소스 Requestes 확인
kubectl describe pod | grep Requests: -A2

# VPA에 의해 기존 파드 삭제되고 신규 파드가 생성됨
kubectl get events --sort-by=".metadata.creationTimestamp" | grep VPA
2m16s       Normal    EvictedByVPA             pod/hamster-5bccbb88c6-s6jkp         Pod was evicted by VPA Updater to apply resource recommendation.
76s         Normal    EvictedByVPA             pod/hamster-5bccbb88c6-jc6gq         Pod was evicted by VPA Updater to apply resource recommendation.

그라파나에서도 확인

CA

이론

- Cluster Autoscale

- Cluster Autoscaler(CA)는 pending 상태인 파드가 존재할 경우, 워커 노드를 스케일 아웃

- 특정 시간을 간격으로 사용률을 확인하여 스케일 인/아웃을 수행

- AWS에서는 Auto Scaling Group(ASG)을 사용하여 Cluster Autoscaler를 적용

- 단점 : 하나의 자원에 대해 두군데 (AWS ASG vs AWS EKS)에서 각자의 방식으로 관리 ⇒ 관리 정보가 서로 동기화되지 않아 다양한 문제 발생

실습

aws ec2 describe-instances  --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Reservations[*].Instances[*].Tags[*]" --output yaml | yh

콘솔에서 태그 달린 것들 확인

해당 태그는 CA 관리하에 있다는 것을 의미

ASG 설정

# 현재 autoscaling(ASG) 정보 확인
# aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='클러스터이름']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table
aws autoscaling describe-auto-scaling-groups \
    --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \
    --output table


# MaxSize 6개로 수정
export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 3 --desired-capacity 3 --max-size 6

# 확인
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table


# 배포 : Deploy the Cluster Autoscaler (CA)
curl -s -O https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
sed -i "s/<YOUR CLUSTER NAME>/$CLUSTER_NAME/g" cluster-autoscaler-autodiscover.yaml
kubectl apply -f cluster-autoscaler-autodiscover.yaml

# 확인
kubectl get pod -n kube-system | grep cluster-autoscaler
kubectl describe deployments.apps -n kube-system cluster-autoscaler
kubectl describe deployments.apps -n kube-system cluster-autoscaler | grep node-group-auto-discovery
      --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/myeks

6개로 바뀐 것 확인

배포 후 확인

CA를 활용한 스케일링 : 부하테스트

cat <<EoF> nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-to-scaleout
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        service: nginx
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx-to-scaleout
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi
EoF

kubectl apply -f nginx.yaml
kubectl get deployment/nginx-to-scaleout

레플리카 셋 스케일 아웃 확인

kubectl scale --replicas=15 deployment/nginx-to-scaleout && date

# 확인
kubectl get pods -l app=nginx -o wide --watch
kubectl -n kube-system logs -f deployment/cluster-autoscaler

노드 자동 증가 확인

kubectl get nodes
aws autoscaling describe-auto-scaling-groups \
    --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \
    --output table
    
./eks-node-viewer --resources cpu,memory
혹은
./eks-node-viewer

CPA

이론

https://malwareanalysis.tistory.com/604#:~:text=11%3A58-,CPA%EB%9E%80%3F,%EB%B6%80%ED%95%98%EB%A5%BC%20%EC%A4%84%EC%9D%BC%20%EC%88%98%20%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4.

EKS 스터디 - 5주차 2편 - CPA

CPA란? CPA(cluster-proportional-autoscaler)는 노드 개수에 비례(proportional)하여 pod 개수 관리합니다. 예를 들어 노드가 추가될 때마다 coredns pod개수 증가시켜, coredns부하를 줄일 수 있습니다. CPA를 사용하

malwareanalysis.tistory.com

- 노드 수 증가에 비례하여 성능 처리가 필요한 애플리케이션(컨테이너/파드)를 수평으로 자동 확장

- 클러스터의 예약 가능한 노드 및 코어 수를 감시하고 복제본 수의 크기를 조정

- 클러스터 크기에 따라 자동 크기 조정이 필요한 애플리케이션과 클러스터의 노드/포드 수에 따라 확장되는 기타 서비스에 적합

- EX) 노드가 추가될 때마다 coredns pod개수 증가시켜, coredns부하를 줄임

출처: https://www.eksworkshop.com/docs/autoscaling/workloads/cluster-proportional-autoscaler/

- CPA에는 API 서버에 연결하고 클러스터의 노드 및 코어 수를 폴링하는 포드 내부에서 실행되는 Golang API 클라이언트가 있음

- 스케일링 매개변수와 데이터 포인트는 ConfigMap을 통해 자동 스케일러에 제공되며 폴링 간격마다 매개변수 테이블을 새로 고쳐 원하는 최신 스케일링 매개변수로 최신 상태를 유지

- CPA는 Metrics API에 의존하지 않으며 Metrics Server가 필요X

실습

cpa 규칙 설정 helm 차트 릴리즈

helm repo add cluster-proportional-autoscaler https://kubernetes-sigs.github.io/cluster-proportional-autoscaler

# CPA규칙을 설정하고 helm차트를 릴리즈 필요
helm upgrade --install cluster-proportional-autoscaler cluster-proportional-autoscaler/cluster-proportional-autoscaler

nginx 디플로이먼트 배포

# nginx 디플로이먼트 배포
cat <<EOT > cpa-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          limits:
            cpu: "100m"
            memory: "64Mi"
          requests:
            cpu: "100m"
            memory: "64Mi"
        ports:
        - containerPort: 80
EOT
kubectl apply -f cpa-nginx.yaml

CPA 규칙 설정

cat <<EOF > cpa-values.yaml
config:
  ladder:
    nodesToReplicas:
      - [1, 1]
      - [2, 2]
      - [3, 3]
      - [4, 3]
      - [5, 5]
options:
  namespace: default
  target: "deployment/nginx-deployment"
EOF
kubectl describe cm cluster-proportional-autoscaler

노드 5개로 증가시킨 후 모니터링

watch -d kubectl get pod

# helm 업그레이드
helm upgrade --install cluster-proportional-autoscaler -f cpa-values.yaml cluster-proportional-autoscaler/cluster-proportional-autoscaler

# 노드 5개로 증가
export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 5 --desired-capacity 5 --max-size 5
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

이번엔 4개로 축소~

aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 4 --desired-capacity 4 --max-size 4
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

지금 인스턴스가 5개지만 4개로 조정 될 것이라는(desired) 것 확인 가능~

helm 으로 배포한 것 uninstall 하는 명령문

helm uninstall cluster-proportional-autoscaler && kubectl delete -f cpa-nginx.yaml

Karpenter

이론

- 2020년 11월에 출시된 오픈소스로 Kubernetes 예약할 수 없는 파드를 감지하고, 새로운 노드를 자동 확장/병합/제거가 가능한 노드 스케일러 이며 구성항목으로는 “Provisioner”와 “AWSNodeTemplate” 를 통해 구성

- Cluster Autoscaler (CA) 와 비슷한 역할을 수행하지만, AWS 리소스에 의존성이 없어 JIT(Just In-Time) 배포가 가능

- 동작원리

출처 : https://devblog.kakaostyle.com/ko/2022-10-13-1-karpenter-on-eks/

노드의 자원이 부족해지면, Pod 는 적절한 Node 를 배정받지 못하고 pending 상태가 됨
Karpenter 는 지속해서 unscheduled Pod 를 관찰, 발견 시
해당 파드의 스펙, 요구사항을 평가하여 새로운 Node 추가를 결정하고 직접 프로비저닝 & 배포
사용하지 않는 노드의 경우 제거
추가된 Node가 Ready 상태가 되면 Karpenter 는 kube-scheduler 를 대신하여 pod 의 Node binding 요청도 수행

- 놀라운 점

pod에 적합한 인스턴스 중 가장 저렴한 인스턴스로 가성비 결정!
여러 노드에 파드가 하나씩 들어 있어 공간이 남는다면 알아서 하나로 몰아서 재정리
작은 거 여러개 vs 큰 거 한개 중 저렴한 것으로 결정

실습

충돌을 피하기 위해 신규 eks 환경에서 실습( 이유는 모르겠으나 기존 계정에서 안돼서 계정부터...새로...새롭게... )

cloudformation로 myeks2 배포

https://s3.ap-northeast-2.amazonaws.com/cloudformation.cloudneta.net/K8S/karpenter-preconfig.yaml

사전 확인 & eks-ops-view 설치

# IP 주소 확인 : 172.30.0.0/16 VPC 대역에서 172.30.1.0/24 대역을 사용 중
ip -br -c addr

# EKS Node Viewer 설치 : 현재 ec2 spec에서는 설치에 다소 시간이 소요됨 = 2분 이상
wget https://go.dev/dl/go1.22.1.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.22.1.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
go install github.com/awslabs/eks-node-viewer/cmd/eks-node-viewer@latest

# [터미널1] bin 확인
cd ~/go/bin && ./eks-node-viewer -h

# EKS 배포 완료 후 실행 하자
cd ~/go/bin && ./eks-node-viewer --resources cpu,memory

EKS 배포

환경 변수 설정

export KARPENTER_NAMESPACE="kube-system"
export K8S_VERSION="1.29"
export KARPENTER_VERSION="0.35.2"
export TEMPOUT=$(mktemp)
export ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
export AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
export GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"
export AWS_PARTITION="aws"
export CLUSTER_NAME="${USER}-karpenter-demo"
echo "export CLUSTER_NAME=$CLUSTER_NAME" >> /etc/profile

echo $KARPENTER_VERSION $CLUSTER_NAME $AWS_DEFAULT_REGION $AWS_ACCOUNT_ID $TEMPOUT $ARM_AMI_ID $AMD_AMI_ID $GPU_AMI_ID

IAM Policy, Role 생성(w. cloudformation)

curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml  > "${TEMPOUT}" \
&& aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file "${TEMPOUT}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"

EKS 클러스터 배포

eksctl create cluster -f - <<EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${CLUSTER_NAME}
  region: ${AWS_DEFAULT_REGION}
  version: "${K8S_VERSION}"
  tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: karpenter
      namespace: "${KARPENTER_NAMESPACE}"
    roleName: ${CLUSTER_NAME}-karpenter
    attachPolicyARNs:
    - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}
    roleOnly: true

iamIdentityMappings:
- arn: "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}"
  username: system:node:{{EC2PrivateDNSName}}
  groups:
  - system:bootstrappers
  - system:nodes

managedNodeGroups:
- instanceType: m5.large
  amiFamily: AmazonLinux2
  name: ${CLUSTER_NAME}-ng
  desiredCapacity: 2
  minSize: 1
  maxSize: 10
  iam:
    withAddonPolicies:
      externalDNS: true
EOF

# eks 배포 확인
eksctl get cluster
eksctl get nodegroup --cluster $CLUSTER_NAME
eksctl get iamidentitymapping --cluster $CLUSTER_NAME
eksctl get iamserviceaccount --cluster $CLUSTER_NAME
eksctl get addon --cluster $CLUSTER_NAME

카펜터 설치

helm install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
  --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=${KARPENTER_IAM_ROLE_ARN}" \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

node pool 생성

cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h # 30 * 24h = 720h
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  amiSelectorTerms:
    - id: "${ARM_AMI_ID}"
    - id: "${AMD_AMI_ID}"
#   - id: "${GPU_AMI_ID}" # <- GPU Optimized AMD AMI 
#   - name: "amazon-eks-node-${K8S_VERSION}-*" # <- automatically upgrade when a new AL2 EKS Optimized AMI is released. This is unsafe for production workloads. Validate AMIs in lower environments before deploying them to production.
EOF

scale up deployment

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
EOF

얘들을 5개로~

kubectl scale deployment inflate --replicas 5

cpu 개수 5개 맞춰라~

cpu 개수 참고

https://aws.amazon.com/ko/ec2/physicalcores/

Amazon EC2 및 RDS DB 인스턴스 유형별 물리적 코어

참고: 이 페이지는 정보 전달의 목적으로만 제공되며 소프트웨어 라이센싱에 대한 법률 조언을 제공하기 위한 목적을 갖고 있지 않습니다. 소프트웨어 라이선싱 방법에 대한 자세한 내용은 해

aws.amazon.com

m5.large 2개 에서 cpu 2개 -> r5d.2xlarge 에서 cpu 4개~ 3개짜리가 없어서 덤인가..

Scale down deployment

# Now, delete the deployment. After a short amount of time, Karpenter should terminate the empty nodes due to consolidation.
kubectl delete deployment inflate && date
kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller

disruption

이론

- 최적화 전략

- Expiration, Drift, Consolidation 세 종류

- 종류

Expiration 만료 : 기본 720시간(30일) 후 인스턴스를 자동으로 만료하여 강제로 노드를 최신 상태로 유지
Drift 드리프트 : 구성 변경 사항(NodePool, EC2NodeClass)를 감지하여 필요한 변경 사항을 적용
Consolidation 통합 : 비용 효율적인 컴퓨팅 최적화

실습

# 기존 nodepool 삭제
kubectl delete nodepool,ec2nodeclass default

# v0.34.0 부터 featureGates 에 spotToSpotConsolidation 활성화로 사용 가능
helm upgrade karpenter -n kube-system oci://public.ecr.aws/karpenter/karpenter --reuse-values --set settings.featureGates.spotToSpotConsolidation=true

# Create a Karpenter NodePool and EC2NodeClass
cat <<EOF > nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        intent: apps
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c","m","r"]
        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values: ["nano","micro","small","medium"]
        - key: karpenter.k8s.aws/instance-hypervisor
          operator: In
          values: ["nitro"]
  limits:
    cpu: 100
    memory: 100Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: Bottlerocket
  subnetSelectorTerms:          
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
  tags:
    Name: karpenter.sh/nodepool/default
    IntentLabel: "apps"
EOF
kubectl apply -f nodepool.yaml

# Deploy a sample workload
cat <<EOF > inflate.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 5
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      nodeSelector:
        intent: apps
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
          resources:
            requests:
              cpu: 1
              memory: 1.5Gi
EOF
kubectl apply -f inflate.yaml

#
watch -d "kubectl get nodes -L karpenter.sh/nodepool -L node.kubernetes.io/instance-type -L topology.kubernetes.io/zone -L karpenter.sh/capacity-type"
kubectl get nodes -L karpenter.sh/nodepool -L node.kubernetes.io/instance-type -L topology.kubernetes.io/zone -L karpenter.sh/capacity-type

# Scale in a sample workload to observe consolidation
# To invoke a Karpenter consolidation event scale, inflate the deployment to 1. Run the following command:
kubectl scale --replicas=1 deployment/inflate
kubectl -n kube-system logs -l app.kubernetes.io/name=karpenter --all-containers=true -f --tail=20
kubectl get nodes -L karpenter.sh/nodepool -L node.kubernetes.io/instance-type -L topology.kubernetes.io/zone -L karpenter.sh/capacity-type
kubectl get node --label-columns=eks.amazonaws.com/capacityType,karpenter.sh/capacity-type
kubectl get node --label-columns=node.kubernetes.io/instance-type,topology.kubernetes.io/zone

# Use kubectl get nodeclaims to list all objects of type NodeClaim and then describe the NodeClaim Kubernetes resource
# using kubectl get nodeclaim/<claim-name> -o yaml. 
# In the NodeClaim .spec.requirements, you can also see the 15 instance types passed to the Amazon EC2 Fleet API:
kubectl get nodeclaims 
kubectl get nodeclaims -o yaml | kubectl neat | yh

# 삭제
kubectl delete deployment inflate
kubectl delete nodepool,ec2nodeclass default

참고자료

https://boostbrothers.github.io/2023-11-01-infra-karpenter-migration/#:~:text=Karpenter%EB%9E%80%202020%EB%85%84%2011,%ED%86%B5%ED%95%B4%20%EA%B5%AC%EC%84%B1%ED%95%A0%20%EC%88%98%20%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4.

AWS Node Auto Scaler Karpenter 도입기 | 비브로스 기술 블로그

안녕하세요, 비브로스 백엔팀에서 인프라를 맡고 있는 박진홍입니다.

boostbrothers.github.io

https://www.eksworkshop.com/docs/autoscaling/workloads/cluster-proportional-autoscaler/

Cluster Proportional Autoscaler | EKS Workshop

Scale workloads proportional to the size of your Amazon Elastic Kubernetes Service cluster with Cluster Proportional Autoscaler.

www.eksworkshop.com

'Infra & DevOps > k8s(EKS)' 카테고리의 다른 글

[EKS] CI/CD (0)	2024.04.19
[EKS] Security (0)	2024.04.11
[EKS] Observability (0)	2024.03.31
[EKS] Storage (0)	2024.03.22
[EKS] network (0)	2024.03.17

현재글[EKS] Autoscaling

Do it, it will be better

[EKS] Autoscaling

Amazon EKS Workshop Study 2기

5주차

Autoscaling

k8s 의 오토스케일링

EKS 의 오토스케일링

실습 환경 인프라 구축

실습 환경 기본 설정

HPA

KEDA

VPA

CA

CPA

Karpenter

disruption

'Infra & DevOps > k8s(EKS)' 카테고리의 다른 글

'Infra & DevOps/k8s(EKS)'의 다른글

티스토리툴바

[EKS] Autoscaling

Amazon EKS Workshop Study 2기

5주차

Autoscaling

k8s 의 오토스케일링

EKS 의 오토스케일링

실습 환경 인프라 구축

실습 환경 기본 설정

HPA

KEDA

VPA

CA

CPA

Karpenter

disruption

'Infra & DevOps > k8s(EKS)' 카테고리의 다른 글

'Infra & DevOps/k8s(EKS)'의 다른글

관련글

티스토리툴바