[Kubernetes Advanced Network Security] AWS EKS : VPC CNI

QwakCheol ㅣ 2024. 10. 31. 18:38

CloudNet 가시다님이 진행하시는 KANS 스터디 9주차 정리입니다.

[ 실습 환경 구성 ]

CloudFormation으로 자동 배포 - VPC 1개(퍼블릭 서브넷 3개, 프라이빗 서브넷 3개), EKS 클러스터(Control Plane), 관리형 노드 구룹(EC2 3대), Add-on(kube-proxy, coredns, aws vpc cni)

접속 후 기본 확인

노드 정보 확인 및 SSH 접속

노드 보안그룹에 eksctl-host에서 노드(파드)에 접속 가능하게 룰 추가

eksctl-host에서 노드의 IP나 coredns 파드 IP로 ping 테스트

1. AWS VPC CNI 소개

k8s CNI : Container Network Interface는 k8s 네트워크 환경을 구성해준다

▶ AWS VPC CNI : 파드의 IP를 할당해준다, 파드의 IP 네트워크 대역과 노드(워커)의 IP 대역이 같아서 직접 통신이 가능하다

Amazon EKS implements cluster networking through the Amazon VPC Container Network Interface plugin, also known as VPC CNI. The CNI Plugin allows Kubernetes Pods to have the same IP address as they do on the VPC network. More specifically, all containers inside the Pod share a network namespace, and they can communicate with each-other using local ports.
Amazon VPC CNI has two components :
1. CNI Binary, which will setup Pod Network to enable Pod-to-Pod communication. The CNI binary runs on a node root file system and is invoked by the kubelet when a new Pod gets added to, or an existing Pod removed from the node.
2. ipamd, a long-running node-local IP address Management (IPAM) daemon and is responsible for:
- managing ENIs on a node, and
- maintaing a warm-pool of available IP address or prefix
When an instance is created, EC2 creates and attaches a primary ENI associated with a primary subnet. The primary subnet may be public or private. The Pods that run in hostNetwork mode use the primary IP address assinged to the node primary ENI and share the same network namespace as the host.
supports native VPC networking with the Amazon VPC Container Network Interface(CNI) plugin for Kubernetes.
VPC와 통합 : VPC Flow logs, VPC 라우팅 정책, 보안 그룹(Security group)을 사용 가능함
This plugin assings an IP address from your VPC to each pod.
VPC ENI 에 미리 할당된 IP(=Local-IPAM Warm IP Pool)를 파드에서 사용할 수 있음 <- 파드의 빠른 시작을 위해서
L-IPAM
: 각 노드에서 사용 가능한 보조 IP 주소의 warm-pool을 유지하기 위해 L-IPAM을 실행하며, Kubelet에서 파드를 추가하라는 요청을 받을 때마다 L-IPAM이 warm-pool에서 즉시 사용 가능한 보조 IP 주소 하나를 가져와서 파드에 할당
: 노드(인스턴스)의 관련 정보인 메타데이터를 통해 사용 가능한 ENI와 보조 IP주소를 파악하고, DaemonSet이 재시작될 때마다 kubelet을 통해 파드 이름, 네임스페이스, IP 주소와 같은 현재 실행 중인 파드의 정보를 가져와서 warm-pool을 구축한다

[k8s Calico CNI와 AWS VPC CNI 차이]

네트워크 통신의 최적화(성능,지연)를 위해서 노드와 파드의 네트워크 대역을 동일하게 설정한다

파드간 통신 시 일반적으로 k8s CNI는 오버레이(VXLAN, IP-IP 등) 통신을 하고, AWS VPC CNI는 동일 대역으로 직접 통신을 한다

▶ 워커 노드에 생성 가능한 최대 파드 갯수

1. Secondary IPv4 address : 인스턴스 유형에 최대 ENI 갯수와 할당 가능 IP수를 조합하여 선정

2. IPv4 Prefix Delegation : IPv4 28bit 서브넷(prefix)를 위임하여 할당 가능 IP 수와 인스턴스 유형에 권장하는 최대 갯수로 선정

3. AWS VPC CNI Custom Networking: 노드와 파드 대역 분리, 파드에서 별도 서브넷 부여 후 사용

▶ 네트워크 기본 정보 확인

노드에 네트워크 정보 확인

네트워크 정보 확인 : eniY는 pod network 네임스페이스와 veth pari

2. 노드에서 기본 네트워크 정보 확인

▶ 워커 노드 1 기본 네트워크 구성 : 워커 노드2는 구성이 유사하여 생략

Network 네임스페이스는 호스트(Root)와 파드 별(Per Pod)로 구분된다
특정한 파드(kube-proxy, aws-node)는 호스트(Root)의 IP를 그대로 사용한다 => 파드의 Host Network 옵션
t3.medium의 경우 ENI마다 최대 6개의 IP를 가질 수 있다
ENI0, ENI1으로 2개의 ENI는 자신의 IP 이외에 추가적으로 5개의 보조 프라이빗 IP를 가질 수 있다
coredns 파드는 veth으로 호스트에는 eniY@ifN 인터페이스와 파드에 eth0과 연결되어 있다

▶ 보조 IPv4 주소를 파드가 사용하는지 확인

▶ 테스트용 파드 생성

파드가 생성되면, 워커 노드에 eniY@ifN 추가되고 라우팅 테이블에도 정보가 추가된다
테스트용 파드 eniY 정보 확인 - 워커 노드 EC2

테스트용 파드 접속(exec) 후 확인

3. 노드 간 파드 통신

[목표] : 파드간 통신 시 tcpdump 내용을 확인하고 통신 과정을 알아본다

파드간 통신 흐름 : AWS VPC CNI 경우 별도의 오버레이(Overlay) 통신 기술 없이, VPC Natvie 하게 파드간 직접 통신이 가능하다

파드간 통신 시 과정 참고

▶ 파드간 통신 테스트 및 확인 : 별도의 NAT 동작 없이 통신 가능

4. 파드에서 외부 통신

파드에서 외부 통신 흐름 : iptable에 SNAT을 통하여 노드의 eth0 IP로 변경되어서 외부와 통신됨

VPC CNI의 External source network address translation (SNAT) 설정에 따라, 외부(인터넷) 통신 시 SNAT 하거나 혹은 SNAT 없이 통신을 할 수 있다

▶ 파드에서 외부 통신 테스트 및 확인

파드 shell 실행 후 외부로 ping 테스트 & 워커 노드에서 tcpdump 및 iptables 정보 확인

: 파드가 외부와 통신시에는 아래처럼 'AWS-SANT-CHAIN-0' 룰(rule)에 의해서 SNAT 되어서 외부와 통신
: 뒤 IP는 eth0(ENI 첫번째)의 IP 주소

카운트 확인 시 AWS-SNAT-CHAIN-0에 매칭되어, 목적지가 192.168.0.0/16 아니고 외부 빠녀갈때 SNAT 192.168.1.251 변경되어 나간다

5. 노드에 파드 생성 갯수 제한

▶ kube-ops-view 설치

Secondary IPv4 address(기본값) : 인스턴스 유형에 최대 ENI 갯수와 할당 가능 IP 수를 조합하여 선정

▶ 워커 노드의 인스턴스 타입 별 파드 생성 갯수 제한

인스턴스 타입 별 ENI 최대 갯수와 할당 가능한 최대 IP 갯수에 따라서 파드 배치 갯수가 결정됨
단, aws-node와 kube-proxy 파드는 호스트의 IP를 사용함으로 최대 갯수에 제외함

▶ 워커 노드의 인스턴스 정보 확인 : t3.medium 사용시

# 파드 사용 가능 계산 예시 : aws-node 와 kube-proxy 파드는 host-networking 사용으로 IP 2개 남음

((MaxENI * (IPv4addr-1)) + 2)

t3.medium 경우 : ((3 * (6 - 1) + 2 ) = 17개 >> aws-node 와 kube-proxy 2개 제외하면 15개

# 워커노드 상세 정보 확인 : 노드 상세 정보의 Allocatable 에 pods 에 17개 정보 확인

kubectl describe node | grep Allocatable: -A6

Allocatable:

cpu: 1930m

ephemeral-storage: 27905944324

hugepages-1Gi: 0

hugepages-2Mi: 0

memory: 3388360Ki

pods: 17

▶ 최대 파드 생성 및 확인

replicas 50 실패

해결 방안 : Prefix Delegatioin, WARM & MIN IP/Prefix Targets, Custom Network

[ EKS Workshop ]

Prefix Delegation : https://www.eksworkshop.com/docs/networking/vpc-cni/prefix/
Custom Networking : https://www.eksworkshop.com/docs/networking/vpc-cni/custom-networking/
Security Groups for Pods : https://www.eksworkshop.com/docs/networking/vpc-cni/security-groups-for-pods/
Network Policies : https://www.eksworkshop.com/docs/networking/vpc-cni/network-policies/
Amazon VPC Lattice : https://www.eksworkshop.com/docs/networking/vpc-lattice/

6. Service & AWS LoadBalancer Controller

서비스 종류

- ClusterIP 타입

- NodePort 타입

- LoadBalancer 타입(기본 모드) : NLS 인스턴스 유형

- Service (LoadBalancer Controller) : AWS Load Balancer Controller + NLB IP 모드 동작 with AWS VPC CNI

- NLB 모드 전체 정리

1. 인스턴스 유형

1) externalTrafficPolicy : ClusterIP => 2번 분산 및 SNAT으로 Client IP 확인 불가능 <- LoadBalancer 타입(기본 모드) 동작

2) externalTrafficPolicy : Local => 1번 분산 및 ClientIP 유지, 워커 노드의 iptables 사용함

통신흐름

요약 : 외부 클라이언트가 '로드밸런서' 접속 시 부하분산 되어 노드 도달 후 iptables 룰로 목적지 파드와 통신됨

노드는 외부에 공개되지 않고 로드밸런서만 외부에 공개되어, 외부 클라이언트는 로드밸런서에 접속을 할 뿐 내부 노드의 정보를 알 수 없다
로드밸런서가 부하분산하여 파드가 존재하는 노드들에게 전달한다, iptables 룰에서는 자신의 노드에 있는 파드만 연결한다(externalTrafficPolicy : local)
DNAT 2번 동작 : 첫번째(로드밸런서 접속 후 빠져 나갈때), 두번째(노드의 iptables 룰에서 파드 IP 전달 시)
외부 클라이언트 IP 보존(유지) : AWS NLB는 타켓이 인스턴스일 경우 클라이언트 IP를 유지, iptables 룰 경우도 externalTrafficPolicy로 클라이언트 IP를 보존

부하분산 최적화 : 노드에 파드가 없을 경우 '로드밸런서'에서 노드에 헬스 체크(상태 검사)가 실패하여 해당 노드로는 외부 요청 트래픽을 전달하지 않는다

2. IP 유형=> 반드시 AWS LoadBalancer 컨트롤러 파드 및 정책 설정이 필요!

1) Proxy Protocol v2 비활성화 => NLB에서 바로 파드로 인입, 단 ClientIP가 NLB로 SNAT 되어 Client IP 확인 불가능

2) Proxy Protocol v2 활성화 => NLB에서 바로 파드로 인입 및 ClientIP 확인 가능 (-> 단 PPv2를 애플리케이션 인지할 수 있게 설정 필요)

▶ AWS LoadBalancer Controller 배포

▶ 서비스/파드 배포 테스트 with NLB

AWS NLB의 대상 그룹 확인 : IP를 확인해보자
파드 2개 -> 1개 -> 3개 설정 시 동작 : auto discovery

▶ Pod readiness gate : ALB/NLB 대상(ip mode)이 ALB/NLB의 헬스체크에 의해 정상일 경우 해당 파드로 전달할 수 있는 기능

NLB 대상 타켓을 Instance mode로 설정해보기

▶ NLB IP Target & Proxy Protocol v2 활성화 :

▶ Istio 내부망에서 클라이언트의 소스 IP 주소 확인을 위한 방법 중 Proxy Protocol 을 사용한 방법

Proxy Protocol on AWS NLB and Istio ingress gateway
[Toss 사례] '외부사 통신 영역 - L7 장비 - 토스 내부 Istio Gateway
- 외부사와 토스 Istio Gateway 간 mTLS 통신 사용으로, 중간에 위치한 L7 장비는 HTTPS 암호화된 내용을 알수 없음(L4역할만)
- 이로 인해 L7 장비는 외부사 클라이언트 IP를 XFF에 담아서 전달하지 못함, 단순 L4 역할로 Istio GW로 전달 시 소스IP를 L7 장비 IP로 SNAT됨
- 결국, 토스 Istio Gatway는 클라이언트의 소스 IP 주소를 L7 장비로만 알게 됨
- 이를 해결하기 위해 L7 장비(현 구성에서는 L4 역할)에서 Proxy Protocol을 활성화하여 Istio GW에게 전달
- Istio Gateway는 Proxy Protocol를 통해 획득한 클라이언트의 소스 IP 주소를 XFF 헤더에 추가하는 헤더 조작을 하는 것으로 보임
- 이후 내부망에 있는 애플리케이션에서는 정상적으로 XFF 헤더에 정보로 클라이언트의 소스 IP 주소 확인 가능

7. Ingress

인그레스 소개 : 클러스터 내부의 서비스(ClusterIP, NodePort, LoadBalancer)를 외부로 노출(HTTP/HTTPS) - Web Proxy 역할

AWS LoadBalancer Controller + Ingress(ALB) IP 모드 동작 with AWS VPC CNI

▶ 서비스/파드 배포 테스트 with Ingress(ALB)

ALB 대상 그룹에 등록된 대상 확인 : ALB에서 파드 IP로 직접 전달

파드 3개로 증가

▶ Exposing Kubernetes Applications, Part 1 : Service and Ingress Resources

1. Exposing a Service : In-tree Service Controller

2. Ingress Implementations : External Load Balancer

3. Ingress Implementations : Internal Reverse Proxy

4. Kubernetes Gateway API

8. ExternalDNS

소개 : K8s 서비스/인그레스 생성 시 도메인을 설정하면, AWS(Route 53), Azuzre(DNS), GCP(Cloud DNS)에 A 레코드(TXT레코드)로 자동 생성/삭제

ExternalDNS CTRL 권한 주는 방법 3가지 : Node IAM Role, Static credentials, IRSA

▶ AWS Route 53 정보 확인 & 변수 지정 : Public 도메인 소유를 하고 있어야 함

▶ ExternalDNS 설치

▶ Service(NLB) + 도메인 연동(ExternalDNS)

9. CoreDNS

참고 : https://aws.amazon.com/ko/blogs/containers/recent-changes-to-the-coredns-add-on/

쿠버네티스 DNS 쿼리 Flow

10. Topology Aware Routing

▶ 테스트를 위한 디플로이먼트와 서비스 배포

▶ 테스트 파드(netshoot-pod)에서 ClusterIP 접속 시 부하분산 확인 : AZ(zone) 상관없이 랜덤 확률 부하분산 동작

IPTables 정책 확인 : ClusterIP는 KUBE-SVC-Y -> KUBE-SEP-Z... (3곳) => 즉, 3개의 파드로 랜덤 확률 부하 분산 동작

SVC 정책 확인 : SEP(Endpoint) 파드 3개 확인 > 3개의 파드로 랜덤 확률 부하분산 동작

▶ Topology Mode(구 Aware Hing) 설정 후 테스트 파드(netshoot-pod)에서 ClusterIP 접속 시 부하분산 확인 : 같은 AZ(zone)의 목적지 파드로만 접속

힌드는 엔드포인트가 트래픽을 제공해야 하는 영역을 설명한다. 그런 다음 적용된 힌트 kube-proxy에 따라 영역에서 엔드포인트로 트래픽을 라우팅
- When topology aware routing is enabled and implemented on a Kubernetes Service, the EndpointSlice controller will proportionally allocate endpoints to the different zones that your cluster is spread across. For each of those endpoints, the EndpointSlice controller will also set a hint for the zone. Hints describe which zone an endpoint should serve traffic for. kube-proxy will then route traffic from a zone to an endpoint based on the hints that get applied.

Topology Aware Routing 설정 : 서비스에 annotate에 추가

IPTables 정책 확인 : ClusterIP는 KUBE-SVC-Y -> KUBE-SEP-Z.. (1곳, 해당 노드와 같은 AZ에 배포된 파드만 출력) => 동일 AZ간 접속

추가테스트 : 만약 파드 갯수를 1개로 줄여서 같은 AZ 목적지 파드가 없을 경우

11. Using AWS Load Balancer Controller for blue/green deployment, canary deployment and A/B testing

▶ ALB 동작 소개

Weighted target group 가중치가 적용된 대상 그룹
- AWS 고객이 블루/그린 및 카나리아 배포와 A/B 테스트 전략을 채택할 수 있도록 돕기 위해 AWS는 2019년 11월에 애플리케이션 로드 밸런서에 대한 가중 대상 그룹을 발표했다.

▶ Deploy the sample application version 1 and version 2

The sample application used here is hello-kubernetes. Deploy two versions of the applications with custom messages and set the service type to ClusterIP:

▶ Deploy ingress and test the blue/green deployment

▶ Blue/geen deployment

To perform the blue/green deployment, update the ingress annotation to move all weight to version 2

▶ Deploy Ingress and test the canary deployment

Instead of moving all traffic to version 2, we can shift the traffic slowly towards version 2 by increasing the weight on version 2 step by step. This allows version 2 to be verified against a small portion of the production traffic before moving more traffic over. The following example shows that 10 percent of the traffic is shifted to version 2, while 90 percent of the traffic remains with version 1.

▶ Argo Rollouts

When performing a canary deployment in a production environment, typically the traffic is shifted with small increments. Usually it is done with some level of automation behind it. Various performance monitoring systems can also be integrated into this process, making sure that every step of the way there are no errors, or the errors are below an acceptable threshold. This is where progressive delivery mechanisms such as Argo Rollouts are very beneficial.
Argo Rollouts offers first class support for using the annotation-based traffic shaping abilities of AWS Load Balancer Controller to gradually shift traffic to the new version during an update. Additionally, Argo Rollouts can query and interpert metrics from various providers to verify key KPIs and drive automated promotion or rollback during an update. More information is available at Argo Rollouts integration with Application Load Balancer.

▶ Deploy ingress and test the A/B testing

Ingress annotation alb.ingress.kubernetes.io/conditions.${conditions-name} provides a method for specifying routing conditions in addition to original host/path condition on ingress spec. The additional routing conditions can be based on http-header, http-request-method, query-string and source-ip. This provides developers multiple advanced routing options for their A/B testing implementation, without the need for setting up and managing a separate routing system, such as service mesh.
AWS Load Balancer Controller configures the listener rules as per the annotation to direct a portion of incoming traffic to a specific backend. In the following example, all requests are directed to version 1 by default. The following ingress resource directs the traffic to version 2 when the reuqest contains a custom HTTP header: HeaderName=HeaderValue1.

12. Network Policies with VPC CNI

AWS EKS fully supports the upstream Kubernetes Network Policy API, ensuring compatibility and adherence to Kubernetes standards.

동작 : eBPF로 패킷 필터링 동작 - Network Policy Controller, Node Agent, eBPF SDK

사전 조건 : EKS 1.25 버전 이상, AWS VPC CNI 1.14이상, OS 커널 5.10 이상 EKS 최적화 AMI(AL2, Bottlerocket, Ubuntu)
Network Policy Controller : v1.25 EKS 버전 이상 자동 설치, 통제 정책 모니터링 후 eBPF 프로그램을 생성 및 업데이트하도록 Node Agent에 지시
Node Agent : AWS VPC CNI 번들로 ipamd 플러그인과 함께 설치됨(aws-node 데몬셋), eBPF 프로그램을 관리
eBPF SDK : AWS VPC CNI에는 노드에서 eBPF 프로그램과 상호 작용할 수 있는 SDK 포함, eBPF 실행의 런타임 검사, 추적 및 분석 가능

▶ 사전 준비 및 기본 정보 확인

모든 트래픽 거부

동일 네임스페이스 + 클라이언트1로부터의 수신 허용

another-ns 네임스페이스로부터의 수신 허용

13. IPv6 with EKS

The Journey to IPv6 on Amazon EKS: Foundation (Part 1) - Link

The Journey to IPv6 on Amazon EKS: Foundation (Part 2) - Link

The Journey to IPv6 on Amazon EKS: Foundation (Part 3) - Link

14. AWS VPC CNI + Cilium CNI : Hybrid mode

구성 방안 : 각 CNI의 강점을 조합하여 사용 - AWS VPC CNI(IPAM, Routing 등), Cilium(LB, Network Policy, Encryption, Visibility)

In this hybrid mode, the AWS VPC CNI plugin is responsible for setting up the virtual network devices as well as for IP address management (IPAM) via ENIs.
After the initial networking is setup for a given pod, the Cilium CNI plugin is called to attach eBPF programs to the network devices set up by the AWS VPC CNI plugin in order to enforce network policies, perform load-balancing and provide encryption.
제약 사항 : Layer 7 Policy
다만, Cilium FUll 기능 사용을 위해서 AWS VPN CNI를 제거하고 Fully to Cilium 사용을 권장함

'스터디 > Kubernetes' 카테고리의 다른 글

[Kubernetes Advanced Network Security] Cilium CNI_2 (5)	2024.10.26
[Kubernetes Advanced Network Security] Cilium CNI_1 (0)	2024.10.26
[Kubernetes Advanced Network Study] Service Mesh : Istio-Mode(Sidecar, Ambient) _2 (4)	2024.10.18
[Kubernetes Advanced Network Study] Service Mesh : Istio-Mode(Sidecar, Ambient) _1 (2)	2024.10.18
[Kubernetes Advanced Network Study] Ingress & Gateway API_2 (4)	2024.10.12

QwakCheol