Cloud Native Ecosystem / Cloud Services / Kubernetes / Contributed

Access AWS Services Through a Kubernetes Dual-Stack Cluster

29 Nov 2021 10:00am, by
Saurabh Modi
Saurabh Modi is an IT professional with over a decade of experience, ranging from business intelligence, statistical analysis, application development to production support and Kubernetes cloud infrastructure. He's worked with consulting companies to large fintech and corporate companies, using unique and creative solutions to solve problems.

This article series will focus on connecting a Kubernetes dual IPv4/IPv6 stack  with Amazon Web Services‘ service APIs, using AWS-cloud-controller-manager (I’ll refer AWS-Cloud-Controller-Manager as “AWS-ccm” and Kubernetes as K8s throughout the article). AWS-ccm “allows a Kubernetes cluster to provision, monitor and remove AWS resources necessary for operation of the cluster,” according to the AWS documentation.

First, I’ll discuss the manifest in the AWS cloud-provider repository and a hack I used to make it work, and in the second part of this tutorial (which will run next week), I’ll go through how to run AWS-ccm using systemd service.

I will not go into the details of how to set up an AWS cluster on Kubernetes or pricing for AWS and Kubernetes, but instead will focus specifically on aws-cloud-controller-manager in detail. There are plenty of very good articles, blogs, and projects available on the internet to set up a Kubernetes cluster on AWS cloud.

I’ve tested on AWS Cloud Ubuntu-20.04 instance image and Kubernetes versions 1.21.2 and 1.22.1. For this post, I used Kelsey Hightower’s inspired Kubernetes The Hard Way guidance on how to set up a cluster.

DISCLAIMER: AWS cloud-controller-manager is under heavy development and still in ALPHA, you should not use it in production until it’s become mature and stable.

Prerequisite

Before you begin, you need to have a dual-stack cluster up and running — meaning AWS vpc is configured to receive both IPv4/IPv6 traffic along with K8s dual-stack features enabled.

Note: If you have an IPv4-only cluster (a typical K8s cluster) then also you had to go through the same drill that we are about to go through, but, then you don’t need to enable the dual-stack feature.

About Cloud Controller Manager

The cloud-controller-manager is a Kubernetes control plane component that embeds cloud-specific control logic. The cloud controller manager lets you link your cluster into your cloud provider’s API and separates out the components that interact with that cloud platform from components that only interact with your cluster.

The K8s in-tree cloud provider code has mostly stopped accepting new features, the in-tree plugins will be removed in a future release of Kubernetes.

There is very good information available on Kubernetes Github, on what you should NOT do beginning K8s v1.20.

Before 1.20, AWS-ccm was part of the core Kubernetes API and we use it to enable the flag–cloud-provider=aws in API server, controller-manager and kubelet service file to connect to the AWS resources — but not anymore.

You should not specify the --cloud-provider flag in kube-apiserver and AWS-ccm. This ensures that it does not run any cloud-specific loops that would be run by the cloud controller manager. In the future, this flag will be deprecated and removed.

The kubelet must run with --cloud-provider=external. This is to ensure that the kubelet is aware that it must be initialized by the cloud controller manager before it is scheduled for any work.

Where do I find AWS-ccm? 

This is the current home for AWS-ccm and the future development for the AWS cloud provider should continue over here.

Let’s Get the Ball Rolling

First, I had created IAM policies for EC2 instances for the AWS-ccm to be able to communicate to AWS’ APIs.

How did I test the manifest? On the control-plane node, once it was up and running, I set up a kubelet and kube-proxy and brought it up and untainted it. 

We are bending a few rules here.

kubectl taint nodes <node-name>

Then I removed the tolerations from aws-daemonset and, labeled it so that it is only deployed on a control-plane node.

removed the tolerations from aws-daemonset

kubectl label nodes ip-172-31-67-211.ec2.internal master=aws-ccm

Then we changed the node selector key-value:

Changing the node selector key-value:

to

Changing the node selector key-value

Note: If you are using kubeadm or any other projects that use a manifest to deploy then you should not remove the taint on the node and node selector part in the manifest and make changes according to your project requirement.

After that, I deployed the Cilium CNI-plugin and CoreDNS.

Then I deployed the RBAC and AWS-ccm manifest from the AWS-cloud provider repository manifest folder. There are a bunch of files in the manifest folder, apart from kustomization.yaml I had used all the files to deploy.

It did not work, and it errored out with lots of information.

Now we have a puzzle on our hands. What do we do?

The solution lies in the logs.

That’s where I got the first crack at solving it. I started to mine the logs and tested each flag one by one which errored out and for what they are used.

Note: I’ve consolidated the errors into one log file to make it easy to read because each time you get an error it generates a big log file.

kubectl logs aws-cloud-controller-manager-bdnj5 -n kube-system

Kubernetes has undergone many changes as it evolves. Newer versions often require something called an API server.

The kube-apiserver must enable an aggregation layer.

The aggregation layer configuration enables the Kubernetes API server to be extended with additional APIs that are not part of the Kubernetes API core.

Because in the newest Kubernetes version, AWS-ccm is not part of the core API. Different cloud providers maintain their own respective cloud repositories.

In some projects like kubeadm, it is already wired up for you, but in this setup, you will need to add it.

You will need to make the following changes to your kube-apiserver service file and add these flags if you have not done so —

  --requestheader-client-ca-file=front-proxy-ca.pem \

  --requestheader-allowed-names=front-proxy-client \

  --requestheader-extra-headers-prefix=X-Remote-Extra- \

  --requestheader-group-headers=X-Remote-Group \

  --requestheader-username-headers=X-Remote-User \

  --proxy-client-cert-file=proxy-client.pem \

  --proxy-client-key-file=proxy-client-key.pem \

If you are not running kube-proxy on a host running the API server, then you must make sure that the system is enabled with the following kube-apiserver flag:

--enable-aggregator-routing=true

Let’s create the required certificate for the above flags and create a new certificate for the front proxy. Don’t use the one we used for the API server.

A bunch of certificates will be created:

certificate creation.

Don’t forget to reload the file kube-apiserver service file.

sudo systemctl daemon-reload

Now we are getting ready to explore the AWS-ccm.  

What do we do now?

1) We need to create a secret for --client-ca file and --requestheader-client-ca file

Here the ca.pem certificate is the same one used in the kube-apiserver, kube-controller-manager and kubelet.

kubectl create secret generic -n kube-system mysecret \

--from-file=ca.pem 

kubectl create secret generic -n kube-system proxy-secret \

       --from-file= front-proxy-ca.pem  

2) Then create a configmap or hostpath for cloud-config.conf whatever works for you.

There are two cloud-config files: one for K8s AWS-ccm 1.21.0 version and another one for K8s 1.22.0 version.

For the 1.22.0 version use this file.

And for the AWS-ccm 1.21.0 version use this file.

You’ll need to make an adjustment to your cloud-config file according to your AWS-ccm version. Otherwise, if you use 1.22.0 version cloud-config file for 1.21.0 version you will get an error:

I had tested the manifest using both the host path file and config map.

If you want to use a config map, then let’s create one.

kubectl create configmap aws-config --from-file=cloud-config.yaml=cloud-config.conf -n kube-system

The next step is Role-Based Access Control and service account deployment.

Processes in containers inside pods use service account tokens to communicate with the Kubernetes API server. They are authenticated as a particular Service Account (for example, default) but to some extent and to further extend it —capabilities to access additional cluster API resources will need Role-Based Access Control (RBAC).

A Service Account is kind of a linchpin in AWS-ccm. Make sure proper roles and permissions are assigned to it to perform a particular task.

We had modified the AWS-ccm manifest for our service account to access additional information like secrets, nodes, services, node/status. We will need to add more resources to API  groups in the cluster role to the current clusterrole.yaml file to avoid any errors, such as:

1) Serviceaccounts "aws-cloud-provider" is forbidden: User "system:cloud-controller-manager" cannot get resource "serviceaccounts" in API group "" in the namespace "kube-system"

2) Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:kube-system:cloud-controller-manager" cannot list resource "secrets" in API group "" in the namespace "kube-system"

Let’s discuss the –args flags we have used.

  • Cloud-provider — In our case it is --cloud-provider=aws, before K8s 1.20 we used to enable this flag in kube-apiserver, kube-controller-manager and kubelet, but now it is a part of AWS-ccm.
  • Cloud-config — We specify the cloud configuration file in which we provide our global variables, cluster Zone, RoleARN and NodeIPFamilies. We need to create a role and attach an IAM policy to it for the AWS-cloud-controller-manager to be able to communicate to AWS APIs.
  • Client-ca-file — To authorize control plane components and end-users.
  • Use-service-account-credentials — We have set this value to true as we have created a service account — cloud-controller-manager and RBAC for this service account to access cluster API resources.
  • Allocate-node-cidrs — If you have enabled this flag in kube-controller-manager for to set podCIDR on nodes, then you should set it to true otherwise not.
  • Configure-cloud-routes — We don’t need the AWS cloud provider to create routes as we are using a container networking interface (Cilium in my case) unless required by your CNI.
  • Requestheader-client-ca-file — To authorize aggregation API server requests.
  • Requestheader-allowed-names — In our case common name CN="front-proxy-client" in requestheader-client-ca-file is one of the names in the list provided by the --requestheader-allowed-names flag. If the name is allowed, the request is approved; if it is not, the request is not.

To summarize --requestheader-client-ca-file and --requestheader-allowed-names the Kubernetes API server will use the files indicated by --proxy-client-*-file to authenticate to the AWS-ccm, in order for the request to be considered valid by AWS-ccm, remember aggregation layer?

The following conditions must be met:

1) The connection must be made using a client certificate that is signed by the CA  whose certificate is in --requestheader-client-ca-file.

2) The connection must be made using a client certificate whose common name (CN) is one of those listed in --requestheader-allowed-names.

Kubernetes API server will create a config map in the kube-system namespace called extension-apiserver-authentication, in which it will place the CA certificate and the allowed CNs. These in turn can be retrieved by extension API servers to validate requests.

kubectl get configmap extension-apiserver-authentication  -n kube-system -o yaml

Images used for testing:

gcr.io/k8s-staging-provider-aws/cloud-controller-manager:v1.20.0-alpha.0

Somehow the AWS-ccm image I tried to run, for K8s 1.21, gave me an image pull error:

gcr.io/k8s-staging-provider-aws/cloud-controller-manager:v1.21.0-alpha.0

There is an issue open for this, which suggested using the following image:

gcr.io/k8s-staging-provider-aws/cloud-controller-manager:v20210510-v1.21.0-alpha.0

And for K8s 1.22 version:

us.gcr.io/k8s-artifacts-prod/provider-aws/cloud-controller-manager:v1.22.0-alpha.0

Let’s first discuss couple of scenarios for the test run of AWS-ccm on dual-stack on K8s 1.21 version and then we will move on to K8s 1.22, to see what has changed in AWS-ccm between these two versions.

Note: Only one IP address (either IPv4 or IPv6) is allowed in --node-ip flag in the kubelet systemd service file.

Scenario 1

If you specify --node-ip=::  in the kubelet system service file then it will work fine and PreferDualStack will obey the order list but the host will pick IPv4 address and the internal IP will be an IPv4 address but the pods will have IPv6. 

This is the “Kubernetes Up and Running (kuard) service file I’ve used.

And for the kuard app, you can refer to this link.

For dual-stack validation, I’ve used this K8s document.

"kubectl get pods" command line

Scenario 2

If you specify --node-ip=<ipv6-address>in the kubelet.service file as node IPv6 address then it will error out, as node controller is not aware of  IPv6 address and it expected it to be ipv4 or listen to all —  0.0.0.0  or  ::.

And if we run the below command to get information about the node:

kubectl get nodes ip-172-31-79-7.ec2.internal -o go-template --template='{{range .status.addresses}}{{printf "%s: %s \n" .type .address}}{{end}}'

We don’t see any other AWS info like- InternalDNS, ExternalDNS, ExternalIP, so it looks like we are not able to fetch the AWS node information.

Scenario 3

When you deploy an AWS-ccm manifest, but if you don’t use --node-ip flag at all in kubelet service file then it defaults to IPv4 and it will work but the pods will have IPv4 address even if you have PreferDualStack in your service file with IPv6 as a preferred IP address in the list.

So where are we heading towards with all these tests?

That’s where I decided to contact the AWS-cloud-provider maintainer and contributors to discuss AWS-ccm dual-stack.

What I understand from the discussion is that there was no released version of AWS-ccm that support IPv6 until v1.21.0-alpha.0 when I tested it. But in the new AWS-ccm v1.22.0-alpha.0 release IPv6 support will be available. (Thanks to Ciprian Hacman, one of the contributors to AWS-ccm project, working on IPv6 features, I was able to test AWS-ccm v1.22.0-alpha.0 image from his internal repo).

And voila!

If I use --node-ip=<node-ipv6-address> in kubelet service file and run again, the error is gone and node sync with node-controller and we have two internal IP addresses — IPV4 and IPV6.

If we run the below command, we can see in the figure above, it can now fetch AWS-resources. This was not the case when we tested with K8s AWS-ccm v1.21.0-alpha.0.

The scenario-2 which we discussed earlier has now been taken care of.

Now let’s talk about AWS-ccm v1.22.0-alpha.0 for K8s 1.22, the changes which we had discussed had been incorporated and now working.

The only changes we need to make is in our cloud-config.conf file for AWS-ccm v1.22.0-alpha.0 version is — we will need to add these two global variables:

I’ve already provided the cloud-config files for different versions while discussing configmap and hostpath file.

This concludes part one, in which we discussed how to deploy AWS-ccm using the manifest setup. Next time, we will discuss how to deploy AWS-ccm using a systemd service.

My gratitude to AWS-ccm maintainer Nick Turner and contributors Yang Yang and Ciprian Hacman for their time and I really appreciate their help and the information they had provided and thanks to Duffie Cooley @mauilion for his valuable inputs and time while drafting this article.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Hightower.

Feature image par Neri Vill de Pixabay