Switching to ARM from x86 on AWS Kubernetes Cluster

At AnnounceKit we use Kubernetes for our cluster management. We first deployed on GCP using completely managed Kubernetes service and then switched to AWS using EKS which is the managed Kubernetes service of AWS.

Given that we are now running on AWS and they have been heavily pushing their ARM based server instances. We decided to give it a try and see how things work. (Our recent switch to M1 mac computers also has a small role in this push). Here’s how we approached the issue;

Docker Images

The obvious issue regarding to this switch is the fact that we need to build our docker images for ARM64 architectures. We have been using GitHub Actions to build our x86 images so naturally that is where we looked at for a solution.

On GitHub actions front, you can either use a self hosted action runner. They support ARM architectures and might be the best solution if you wish to create only ARM64 builds.

We wanted to create multi arch images at first to make it easier for us to handle the switch and be able to roll back things fast. So we needed to use the new build pipeline that can leverage qemu for arm emulation. It is as simple as adding docker/setup-qemu-action and docker/setup-buildx-action then specifying platforms on the build push action.

Here’s our current GitHub actions snippet for building both x86 and arm images:

jobs:
  build:
    steps:
      - uses: actions/checkout@v2

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

      - name: Build and push
        uses: docker/build-push-action@v2
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: |
            ${{ secrets.AWS_REPO }}:${{ env.GITHUB_SHA }}
            ${{ secrets.AWS_REPO }}:latest
          cache-from: type=registry,ref=${{ secrets.AWS_REPO }}:latest
          cache-to: type=inline

It is this simple. Now, whenever we build the images, the image tags have layers available for multiple platforms. You can run the same image on an x86 machine or an arm machine. So, we are getting close.

Kubernetes

As mentioned before, we use Kubernetes for cluster management and until recently, we had an EKS cluster with a single managed node group of good old x86 instances. The EKS clusters (at least the recent versions) are completely ready to be used with both arm and x86 instances. You can simply add a graviton based instance group and the nodes will start chugging along.

That was what we’d do, except, we have a couple kubernetes deployments that we can not build for ARM. We might not have control over the base image or binary blobs going into the build. In any case, we needed some auxillary x86 capacity for our incompatible pods. This means we’ll be running a hybrid cluster and that means we need to somehow schedule pods on correct architectures.

There are multiple ways to do this. You can use pod affinities, taints, label selectors and probably some other methods. We will go with the simplest way, using label selectors.

Each node in your cluster has a set of predefined lables, provided by the platform. You can also customize your own labels but we already have the necessary information available. Just by using kubect describe nodes, we can see the built in labels applied to our nodes:

Labels:             Name=announcekit-******************
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=******************
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=ON_DEMAND
                    eks.amazonaws.com/nodegroup=******************
                    eks.amazonaws.com/nodegroup-******************
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=******************
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=******************
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1a

Using these lables, you could schedule your pods to a specific OS, a specific region or zone and to our luck, to a specific architecture. kubernetes.io/arch=amd64 tells us that this node is running on amd64. The graviton instances will show up as arm64 here.

It is now as easy as targeting the specific label on your pod specs as in

apiVersion: apps/v1
kind: Deployment
metadata:
  name: announcekit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: announcekit
  template:
    metadata:
      labels:
        app: announcekit
    spec:
      nodeSelector:
        kubernetes.io/arch: amd64
      containers:
        - image: cilium/echoserver
          imagePullPolicy: Always
          name: announcekit-cname
          ports:
            - containerPort: 80
      restartPolicy: Always

The important part here is the nodeSelector. Here it is denoted that this deployment should run its pods on amd64 architecture. If there are no available instances with the given label, the pods will fail to schedule.

From here on, it is as simple as targeting specific pods to specific architectures. And starting with multi architecture builds, it is as simple as flipping the selector configuration.

Ekin Koc

Co founder of Kovan Studio. Working on technology and development.