No Kubernetes Needed: Amazon ECS Anywhere
The hybrid and the multicloud market is heating up. Hyperscalers such as Microsoft and Google are extending their infrastructure services to on-prem data centers and other public clouds. Platform vendors such as Red Hat and VMware have container management services that can run in diverse environments including data centers, private cloud, and public cloud.
What’s common in multicloud platforms such as Google Anthos, Microsoft Azure Arc, Red Hat OpenShift and Advanced Cluster Manager, and VMware Tanzu? The answer is obvious — it’s Kubernetes. Almost all the hybrid and multicloud platforms are based on Kubernetes.
Amazon Web Services, one of the first and the leading public cloud providers, has taken a different approach to deliver hybrid and multicloud platforms. Along with Kubernetes, AWS has also extended Elastic Container Service (ECS) as a vehicle to deliver hybrid cloud capabilities to customers.
Quick Recap of Amazon ECS
Amazon ECS was launched in 2014 — a year before Kubernetes became available — as a managed container orchestration platform for AWS customers. It was modeled after Docker Compose, the tool that enabled multiple containers to run as a single workload.
In ECS, containers are wrapped in a task definition that is registered with the control plane which schedules the task in one of the hosts of the cluster. The host, which is typically an Amazon EC2 instance, runs an agent that communicates with the control plane. After launching a task, it can be scaled to multiple instances which becomes a service. The service ensures that the specified number of tasks are always running.
For a Kubernetes user, the architecture of ECS looks familiar. An ECS task is similar to a Kubernetes Pod while an ECS service is comparable to a Kubernetes deployment. The ECS agent running on an EC2 host mimics what a kubelet does in a Kubernetes node.
Since ECS is an AWS-managed service, it’s tightly integrated with various AWS services such as CloudWatch, ALB, VPC, and others.
ECS works well with Fargate, the serverless container platform from AWS. With Fargate, customers don’t deal with EC2 instances. Behind the scenes, Fargate orchestrates the compute layer responsible for running the containerized workloads of ECS.
For a detailed discussion of ECS and Fargate, refer to my article, “AWS Fargate Through the Lens of Kubernetes.”
When launching an Amazon ECS cluster, customers can choose between an EC2-based cluster or a Fargate cluster giving them the choice of control and flexibility.
Untethering Container Workloads from AWS
At re:Invent 2020, Amazon announced ECS Anywhere, the service that extends ECS to on-prem and other cloud environments. What this means is that ECS got an additional execution environment beyond EC2 and Fargate. AWS calls the hosts running outside of its cloud as external instances that became the third execution environment for ECS.
The idea of ECS Anywhere is simple — run the same ECS agent designed for EC2 instances in external hosts such as bare metal servers, VMs, and even instances running in other public cloud environments.
Since the external host needs a security context to talk to AWS, ECS Anywhere needs another binary called the AWS Systems Manager Agent (SSM Agent). The SSM agent, which is a part of the AWS Systems Manager, is installed and configured on an EC2 instance, an on-premises server, or a VM to take control of the hosts.
The combination of SSM agent and ECS agent expands ECS’ capabilities to run tasks on external hosts.
The ECS agent communicates with the Docker Engine to manage the lifecycle of containers. An ECS task definition is translated to respective Docker API calls to launch, stop, and kill the container.
One key thing to note is that Amazon ECS Anywhere runs the control plane in the cloud. Only the agents are running in external hosts which maintain connectivity with the control plane.
Amazon ECS Anywhere vs. Kubernetes
Kubernetes has become the de-facto choice for hybrid, multicloud, and edge deployments. Distributions such as K3s made it possible to run Kubernetes in resource-constrained, edge-computing environments. For enterprise, there is a wide range to choose from — Anthos, Azure Arc, Red Hat OpenShift, VMware Tanzu, and Rancher.
Running Kubernetes at the edge comes with its own challenges.
Firstly, you need at least three nodes to ensure the high availability of the cluster. When dealing with multiple clusters, you need a cluster manager such as Rancher, Anthos, Azure Arc, Red Hat Advanced Cluster Manager, or VMware Tanzu Mission Control to manage the cluster.
Upgrading the clusters running at the edge/hybrid environments is a huge challenge. Since each cluster has its own control plane, it needs to be upgraded first before upgrading the nodes. This process is fragile and error-prone.
Kubernetes is preferred to running a standalone Docker because of the tooling and the integration with the mature DevOps ecosystem. But, the cost of running a full-blown cluster outweighs the benefits.
Amazon ECS Anywhere offers the best of both worlds — running containerized workloads in the remote environment with centralized control from the cloud.
One of the limitations of Kubernetes is that the control plane cannot be separated from the worker nodes. Amazon ECS Anywhere is designed to isolate the execution environment from the control plane. If one of the external hosts running an ECS agent loses connectivity with the control plane, it continues to run as if it is launched directly from the Docker CLI. When the connectivity is regained, it will synchronize and reconciles to the desired state.
Pushing a Docker container image to a remote environment without any visibility or control is too risky. With ECS Anywhere, you can manage the containerized workloads while gaining complete visibility into the execution environment.
When the external host registered with AWS SSM is promoted to an advanced instance tier, you can even SSH into the remote instance from the AWS Console. This delivers ultimate control to administrators to manage remote hosts.
Of course, if you need to run containerized workloads in an air-gapped and offline environment, Kubernetes is still the best choice. But, in semi-connected environments, Amazon ECS Anywhere is a lightweight, elegant, and efficient platform.
A Typical Use Case for Amazon ECS Anywhere
Let’s visualize a scenario where a large retail chain is going through the digital transformation journey.
Each store of the retail chain runs a set of computer vision AI models performing real-time inference of camera feeds. The inference code is a containerized workload running on one or more hosts. Apart from inference, the workload also ingests the telemetry from various sensors and ingests that to the cloud.
The retailer is running the AI models on an NVIDIA Jetson Xavier NX device to get the accelerated performance for inference. The IoT workload runs on an Intel NUC machine responsible for ingesting the sensor telemetry and controlling the actuators. For machine-to-machine communication and device management, AWS IoT Greengrass is considered. The store is expected to be connected to the Internet most of the time.
The models and the inference code are built in the cloud with dedicated CI/CD and MLOps pipelines to generate the container images and the final set of deployment artifacts.
The retailer’s engineers realized that running a Kubernetes cluster on a mix of AMD64 and ARM64 bare-metal devices is risky. Also, the store has reliable connectivity to the cloud which made them consider Amazon ECS Anywhere.
As a part of the evaluation, the IT team has registered the Jetson Xavier NX and Intel NUC hosts with Amazon SSM to gain remote access. Then they deployed the ECS Agent and registered the hosts with a cluster dedicated for multiple stores.
The CI/CD pipeline based on AWS CodeCommit, AWS CodeBuild, Amazon ECR is now generating ECS task definitions that are eventually pushed to each store. Amazon SageMaker Pipelines and SageMaker Neo are used for training the models, optimizing them, and storing them in Amazon S3, which are eventually pulled by the workloads running at the edge.
The container logs are ingested into CloudWatch and they are also planning to use Container Insights to gain additional visibility.
The retain chain is able to accomplish the goal of running containerized workloads at the edge without running a fully-fledged Kubernetes cluster while gaining complete control over the deployments.
When to Use Amazon ECS Anywhere
Here is a checklist to consider when choosing Amazon ECS Anywhere:
- You have investments in AWS that need to extend to the edge
- The goal is to run containerized workloads at the edge with minimal effort
- Each host running the ECS agent is connected to the internet most of the time
- You don’t need additional capabilities such as service mesh and GitOps
- You want to SSH into the remote host from AWS Console
In the upcoming articles, I will demonstrate how to deploy containerized ML inference code running on NVIDIA Jetson devices managed by Amazon ECS Anywhere. Stay tuned!