Deploying Amazon ECS on CoreOS

Tue Mar 10, 2015 in coreos , docker , aws , ecs using tags

Today, I stumbled on the official CoreOS page on ECS.

I’ve been putting off ECS for a while, it was time to give it a try.

To create the ECS cluster, we will need the aws commandline tool:

which aws || pip install awscli

Make sure you have your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY defined in your shell environment.

Create the ECS cluster:

aws ecs create-cluster --cluster-name Cosmos-Dev
    "cluster": {
        "clusterName": "Cosmos-Dev",
        "status": "ACTIVE",
        "clusterArn": "arn:aws:ecs:us-east-1:123456789012:cluster/My-ECS-Cluster"

Install the global fleet unit for amazon-ecs-agent.service:

cat <<EOF > amazon-ecs-agent.service
Description=Amazon ECS Agent
ExecStartPre=-/usr/bin/docker kill ecs-agent
ExecStartPre=-/usr/bin/docker rm ecs-agent
ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent
ExecStart=/usr/bin/docker run --name ecs-agent \
    --publish= \
    --volume=/var/run/docker.sock:/var/run/docker.sock \
ExecStop=/usr/bin/docker stop ecs-agent
fleetctl start amazon-ecs-agent.service

This registers a ContainerInstance to the My-ECS-Cluster in region us-east-1.

Note: this is using the EC2 instance’s instance profile IAM credentials. You will want to make sure you’ve assigned an instance profile with a Role that has “ecs:*” access. For this, you may want to take a look at the Amazon ECS Quickstart CloudFormation template.

Now from a CoreOS host, we can query locally to enumerate the running ContainerInstances in our fleet:

fleetctl list-machines -fields=ip -no-legend | while read ip ; do \
    echo $ip $(ssh -n $ip curl -s \
    $(ssh -n $ip curl -s http://localhost:51678/v1/metadata | \
      docker run -i realguess/jq jq .ContainerInstanceArn) ; \

Which returns something like: i-12345678 "arn:aws:ecs:us-east-1:123456789012:container-instance/674140ae-1234-4321-1234-4abf7878caba" i-23456789 "arn:aws:ecs:us-east-1:123456789012:container-instance/c3506771-1234-4321-1234-1f1b1783c924" i-34567891 "arn:aws:ecs:us-east-1:123456789012:container-instance/75d30c64-1234-4321-1234-8be8edeec9c6"

And we can query ECS and get the same:

$ aws ecs list-container-instances --cluster My-ECS-Cluster | grep arn | cut -d'"' -f2 | \
  xargs -L1 -I% aws ecs describe-container-instances --cluster My-ECS-Cluster --container-instance % | \
  jq '.containerInstances[] | .ec2InstanceId + " " + .containerInstanceArn'
"i-12345678 arn:aws:ecs:us-east-1:123456789012:container-instance/674140ae-1234-4321-1234-4abf7878caba"
"i-23456789 arn:aws:ecs:us-east-1:123456789012:container-instance/c3506771-1234-4321-1234-1f1b1783c924"
"i-34567891 arn:aws:ecs:us-east-1:123456789012:container-instance/75d30c64-1234-4321-1234-8be8edeec9c6"

This ECS cluster is ready to use.

Unfortunately, there is no scheduler here. ECS is a harness for orchestrating docker containers in a cluster as tasks.

Where these tasks are allocated is left up to the AWS customer.

What we really need is a scheduler.

CoreOS has a form of a scheduler in fleet, but that is for fleet units of systemd services, and is not limited to docker containers as ECS is. Fleet’s scheduler is also currently a bit weak in that it schedules new units to the fleet machine with the fewest number of units.

Kubernetes has a random scheduler, which is better in a couple ways, but does not fairly allocate the system resources.

The best scheduler at present is Mesos, which takes into account resource sizing estimates and current utilization.

Normally, Mesos uses Mesos Slaves to run work. Mesos can also use ECS as a backend instead.

My next steps: Deploy Mesos using the ecs-mesos-scheduler-driver, as summarized by jpetazzo