Low Maintenance Kubernetes with EKS Auto Mode
Table of Contents
Kubernetes is now a standard technology for high-availability clusters. This article explains an approach for setting up Kubernetes clusters on Amazon EKS with Infrastructure as Code. The EKS clusters use Auto Mode, which automates the scaling and update of nodes, manages several components in the cluster, and simplifies cluster upgrades. The configuration is managed by Terraform and Flux.
More About This Project #
The code for this project is published on both GitHub and GitLab.
The project uses a specific set of tools and patterns to set up and maintain your clusters. Each of these has been chosen because it is well-known and well-supported. The main technologies are Terraform (TF) and Flux. The project also includes tasks for the Task runner. The tasks for TF are provided by my template for a TF project. Like this article, these tasks are opinionated, and are designed to minimise maintenance.
I refer to Terraform and OpenTofu as TF. The two tools work identically for the purposes of this article.
To make it a working example, the project deploys a Web application to each cluster. The podinfo application produces a Web interface and a REST API.
Design Decisions #
- Use one repository for all of the code
- Choose well-known and well-supported tools
- Use AWS services wherever possible
- Support separate development and production clusters
- Use an Infrastructure as Code tool to manage AWS resources for the cluster itself
- Delegate control of AWS resources for the applications on the cluster to automation that also runs on the cluster
- Use a GitOps tool to manage application configuration. GitOps means that the configuration is synchronized with source control.
- Use a configuration that can be quickly deployed with minimal changes. The code can be customised to add features or enhance security.
Out of Scope #
This article does not cover how to set up container registries or maintain container images. These will be specific to the applications that you run on your cluster.
This article also does not cover how to set up the requirements to run TF. You should always use remote state storage with TF, but you should decide how to host the remote state. The example code uses S3 for remote state.
I recommend that you store TF remote state outside of the cloud accounts that you use for working systems. When you use S3 for TF remote state, use a separate AWS account.
Requirements #
Required Tools on Your Computer #
This project uses several command-line tools. You can install all of these tools on Linux or macOS with Homebrew.
The required command-line tools are:
- AWS CLI -
brew install awscli
- Flux CLI -
brew install flux
- Git -
brew install git
- kubectl -
brew install kubernetes-cli
- Task -
brew install go-task
- Terraform - Use these installation instructions
Flux can use Helm to manage packages on your clusters, but you do not need to install the Helm command-line tool.
Version Control and Continuous Integration #
To automate operations, you need a Git repository that is available to your development workstation, the resources on your AWS accounts and your continuous integration system.
Flux updates the configuration of the cluster from this Git repository. This means that you do not need continuous integration to deploy changes. However, you should use continuous integration to test configurations before they are merged to the main branch of the repository and applied to the production cluster by Flux.
This example uses GitLab as the provider for Git hosting. GitLab also provides continuous integration services. You can use GitHub or other services for hosting and continuous integration instead of GitLab.
AWS Requirements #
You will require at least one AWS account to host an EKS cluster and other resources. I recommend that you store user accounts, backups and TF remote state in separate AWS accounts to the clusters.
You will need two IAM roles to deploy an EKS cluster with TF:
- An IAM role for Terraform
- An IAM role for human administrators
The example code defines a dev and prod configuration, so that you can have separate development and production clusters. These copies can be in the same or separate AWS accounts.
AWS Requirements for Each EKS Cluster #
EKS clusters have various network requirements. To avoid issues, each EKS cluster should have:
- A VPC
- Three subnets attached to the VPC, one per availability zone
- A DNS zone in Amazon Route 53
Each subnet should be a /24 or larger CIDR block. By default, every instance of every pod on a Kubernetes cluster will use an IP address. This means that every node will consume up to four IP addresses for Elastic Network Interfaces, plus one IP address per pod that it hosts.
Each subnet that will be used for load balancers must have tags to authorize the Kubernetes controller for AWS Load Balancers to use them. Subnets for public-facing Application Load Balancers must have a tag of kubernetes.io/role/elb with the Value of 1.
I recommend that you define a separate Route 53 zone for each cluster. Create these as child zones for a DNS domain that you own. This enables you to configure the ExternalDNS controller on a cluster to manage DNS records for applications on that cluster without enabling it to manage records on the parent DNS zone.
1: Prepare Your Repository #
Clone or fork the example project to your own Git repository. To use the provided Flux configuration, use GitLab as the Git hosting provider.
Create a dev branch on the repository. The Flux configuration on development clusters will synchronize from this dev branch. The Flux configuration on production clusters will synchronize from the main branch.
2: Customise Configuration #
Next, change the configuration for your own infrastructure.
The relevant directories for configuration are:
- flux/apps/dev/ - Flux configuration for development clusters
- flux/apps/prod/ - Flux configuration for production clusters
- tf/contexts/dev/ - TF configuration for development clusters
- tf/contexts/prod/ - TF configuration for production clusters
Change each value that is marked as Required. In addition, specify the settings for the TF backend in the tf/contexts/context.json
file for dev and prod.
The IAM principal that creates an EKS cluster is automatically granted system:masters in that cluster. In our example code, this principal is the IAM role that TF uses. The TF code also enables administrator access on the cluster to the IAM role for human system administrators.
3: Set Credentials #
This process needs access to both AWS and your Git hosting provider. Set an access token for GitLab as the environment variable
GITLAB_TOKEN
before you run this command.
This example configures Flux to use a GitLab deploy key. This means that the Kubernetes cluster must have SSH access to the GitLab repository for the project.
If you are running the TF deployment from your own system, first ensure that you have AWS credentials in your shell session:
eval $(aws configure export-credentials --format env --profile your-aws-profile)
4: Deploy the Infrastructure with TF #
Run the tasks to initialise, plan and apply the TF code for each module. For example:
TFT_STACK=amc-gitlab TFT_CONTEXT=dev task tft:init
TFT_STACK=amc-gitlab TFT_CONTEXT=dev task tft:plan
TFT_STACK=amc-gitlab TFT_CONTEXT=dev task tft:apply
Apply the modules in this order:
- amc-gitlab - Creates a deploy key on GitLab for Flux
- amc - Deploys a Kubernetes cluster on Amazon EKS
- amc-flux - Adds Flux to a Kubernetes Cluster
The
apply
to create a cluster on EKS will take several minutes to complete.
5: Register Your Cluster with Kubernetes Tools #
Use the AWS command-line tool to register the new cluster with your kubectl configuration.
If you are running the TF deployment from your own system, first ensure that you have AWS credentials in your shell session:
eval $(aws configure export-credentials --format env --profile your-aws-profile)
Run this command to add the cluster to your kubectl configuration:
aws eks update-kubeconfig --name $EKS_CLUSTER_NAME
To set this cluster as the default context for your Kubernetes tools, run this command:
kubectl config set-context $EKS-CLUSTER-ARN
6: Test Your Cluster #
To test the connection to the API endpoint for the cluster, first assume the IAM role for operators. Run this command to get the credentials:
aws sts assume-role --role-arn $HUMAN-OPS-ROLE-ARN --role-session-name human-ops-session
Set these values as environment variables:
- AccessKeyId -> AWS_ACCESS_KEY_ID
- SecretAccessKey -> AWS_SECRET_ACCESS_KEY
- SessionToken -> AWS_SESSION_TOKEN
Next, run this command to get a response from the cluster:
kubectl version
The command should return output like this:
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3-eks-bcf3d70
Once you can successfully connect to a cluster, use the flux command-line tool to check the status of Flux management:
task flux:status
7: Going Further #
The code in the example project is a minimal configuration for an EKS Auto Mode cluster, along with a simple example Web application that is managed by Flux and Helm. You can use EKS add-ons or Flux to deploy additional applications and services on the clusters. Flux also provides a range of management capabilities, including automated update of container images and notifications.
The initial configuration is designed to work with minimal tuning. To harden the systems:
- Replace the generated IAM policies that are provided with custom policies.
- Disable private access to the cluster endpoint.
- Deploy the EKS clusters to private subnets and deploy the load balancers to public subnets.
The current version of this project does not include continuous integration with GitLab. If you decide to use GitLab to manage changes, consider installing the GitLab cluster agent.
Extra: How the TF Code Works #
The tasks for TF are provided by my template for a TF project.
I have made several decisions in the example TF code for this project:
- The example code uses the EKS module from the terraform-modules project. This module enables you to deploy an EKS cluster by setting a relatively small number of values.
- We use a setting in the TF provider for AWS to apply tags on all AWS resources. This ensures that resources have a consistent set of tags with minimal code.
- To ensure that resource identifiers are unique, the TF code always constructs resource names in locals. The code for resources then uses these locals.
- The code supports TF test, the built-in testing framework for TF. You may decide to use other testing frameworks.
- The constructed names of AWS resources include a variant, which is set as a tfvar. The variant is either the name of the current TF workspace, or a random identifier for TF test runs.
Resources #
Amazon EKS #
- Official Amazon EKS Documentation
- EKS Workshop - Official AWS training for EKS
- Amazon EKS Auto Mode Workshop
- Amazon EKS Blueprints for Terraform
- Amazon EKS Auto Mode ENABLED - Build your super-powered cluster - A walk-through EKS Auto Mode with TF