Creating an ECS Cluster & Roles with CloudFormation

Creating an ECS Cluster & Roles with CloudFormation

by Asher
August 10, 2020
ECS Roles
An ECS Cluster with Roles (wearing many hats)

This is the second article in the series to deploy a full stack on AWS ECS using Fargate:

  • Part 1:
    • A complete VPC with security groups, subnets, NAT gateways and more
  • Part 2 (this article):
    • Deploying an ECS cluster and IAM roles for Fargate services
    • Setting up a CloudFront distribution to serve static files
  • Part 3:
    • Creating a simple Django app with a celery backend to process asynchronous requests
  • Part 4:
    • Creating an RDS database & Redis instance
    • Registering the Django app in ECR and deploying it to ECS
  • Part 5:
    • Setting up Auto Scaling, HTTPs routing & Serving Static Files from CloudFront



In our journey to deploy a full-stack on AWS ECS using Fargate we set up the foundational networking resources in the previous post. Now that we have our VPC, subnets and security groups there are just a few more infrastructure related resources that we will deploy before we get into the actual code of the applicaiton. Once we have these resources in place we can start to deploy more resources on top of them. In this article we'll create the common resources that our application will be deployed into and that it will consume, these resources generally provide reuse beyond a single application; they include:

  • An ECS Cluster
  • ECS task, execution & autoscaling roles
  • Buckets to hold code deployment resources & other files required by the app (e.g. static files)
  • A CloudFront distribution that can serve and cache static files

Before getting into any level of detail, one important topic to touch on is the separation of resources between CloudFormation templates (CFTs). We typically take the approach of trying to scope CFTs to contain all the resources required to deploy a service, whether that is done by putting all resources into a single file or using nested stacks . The exception to this is when there is reusability among resources. Just like with any other development CFTs should follow the DRY (don't repeat yourself) principal. There are a few places where this walkthrough blurs the lines and I'll touch on them as they come up.

The Deployment Architecture

All of the resources being reviewed here can be found in the ecs-cluster-template.yaml on the Tree Schema GitHub page.

In order to begin defining the resources that we want to use we first need to define what architecture we want to have when deploying this application. What technical capabilities do we need to have? What security implications should we consider? Since we're deploying this within AWS we should also think about what tools and components are readily available. Here is an example of the thought process for how we would define our architecture:

AWS VPC Deployment
Capability Definiton Thought Process

This only represents a few examples but what we're trying to do here is start off with all of the thoughts and ideas that our team has and to take them from their most unstrcutured form (left), enhance them with further context (middle) and propose a set of solutions (right). The solutions don't always end up being what we move forward with but we find that more often than not, if you're defining the right capabilities that you need to provide and asking the right questions to challenge your own assumptions then the architecture will reveal itself.

I find it helpful to document the architecture before starting to string together any resources; let's take a look at the actual deployment that we're targeting for this stack before stepping through it:

ECS Logical Deployment
Logical Resource Deployment

At first this may appear more complicated than it actually is because I've overlaid the logical resources (e.g. applcaition containers) on top of the networking resources (e.g. subnets). This is not necessarily required if you have good standard practices about what resources get deployed into which availablity groups but I've done this because I explicitly want to show the database deployments in regards to the application - which we'll get to shortly.

ECS Cluster Deployment

Let's start off with looking at what is deployed inside the ECS Cluster. An ECS Cluster is deployed into a region but the cluster itself does not contain any reference to specific subnets or security groups; these are applied at the Task Definition level when you run a task. Each Task Definition is essentially a reference to one or more Docker containers that share a networking space and a Task Definition can be kept running as a long-running application by using AWS Service Definitions.

We will eventually use our Service Definition to inject the security groups and subnets we want to use into our Task Definitions. An important thing to note here is that our Service Definition will automatically place our containers into either of the availablity zones and we don't need to worry about the details of how that happens (at least not for the scope of these articles), this is just one of the many benefits that this serverless deployment gives us!

RDS Postgres & Redis Deployments

To run our applicaiton we will use Postgres for our persistent data as well as Redis for passing asynchronous events to Celery, caching Django requests and speeding up our static file deployments on S3. These databases will all be deployed into private subnets. There should almost never be a need for you to deploy a database into a public subnet. The databases are going to be the only two resources that we deploy manually.

CloudFront & S3 Deployments

AWS CloudFront may have the most daunting GUI to interact with if you're not familiar with CDNs, CORS or the headers required to properly serve static files from a different origin. The deployment through CFTs, however, is actually rather trivial and it's one of the easiest ways to quickly improve the customer experience for page load times while offloading processing from your app to a managed AWS service. We will be using CloudFront to serve our static files for three main reasons:

  • Serving files directly from S3 can be costly if each page load has to retrieve a handful of static files
  • CloudFront offers caching across each of it's global CDNs, allowing static files to be served to end users much faster
  • We can take a large burden off of Django and Nginx by not requiring them to serve static files

Creating the Resouces

We will only be deploying CFT resources within this article, each of the next few sections breaks down a section of the CloudFormation template and provides the background and context for the given resources.

As a quick disclaimer - in reality, some of the resources that we'll cover probably should be deployed in a separate CFT or in one of the CFTs that we'll use to deploy the application later on. However, I don't want the resources within each template to change on GitHub to allow anyone following along with these articles to always be able to repeat the exact same steps. In addition, the structure of these articles does not lend itself to match up perfectly with how the resources are put into CFTs and I'm erring on the side of making the articles more cohesive than having perfect logical separation in CFTs.

Template Parameters

I'm a huge fan of the template parameters. In this particular template there are only two. The important one to note here is YourUserId. We will use this later to ensure that this user has full access to the S3 bucket where the static files will be saved. There are other ways to achieve this access so this parameter, and all references to it, can easily be removed.

ECS Logical Deployment
Template Parameters

ECS Cluster Resource

This will be the easiest resource you've ever deployed in AWS via CFT. As mentioned above, the ECS Cluster by itself doesn't do anything. It simple enables you to deploy resources in the future. The benefit of defining the cluster comes in how you manage the cluster after your services are deployed. By defining a cluster you will be able to set up container insights, get all running services and tasks within a cluster, take global action across all tasks within a cluster and more. You don't get charged for having a cluster, only when you run resources within your cluster.

ECS Logical Deployment
ECS Cluster Resource

ECS Roles

There are several roles that are required in order to deploy ECS tasks:

  • A role to execute your task
  • A role that the task assumes while it is running
  • A role to manage the autoscaling for your applicaiton (not required unless you are going to be auto scaling)

By far the most common questions that I get when setting up a new ECS service is

What do each of these rules do?

Unfortunately, the thing that I hear second most often is one developer telling another

I don't know what they do but just copy and paste them from the other template.

Let's look at each role for a moment to see what exactly they are used for.

ECS Execution Role

This is the one that the AWS ECS & Fargate services assume on your behalf when managing your server. It is responsible for ensuing that you're applicaiton is able to execute - that is, it primarily pulls images from ECR and allows logs to be written to CloudWatch. You can see the full policy that AWS uses with the managed policy that we'll be using for this role.

ECS Execution Role
ECS Execution Role
ECS Task Role

The task role is the one that your task will assume in order to invoke other AWS services. Need access to S3, DynamoDB, SNS & to invoke lambdas? You'll want to define a custom policy and create a task role for that. We are defininig the task role in this CFT and not with the CFT that will deploy our application because we generally reuse roles where the applications should have the same set of scope and access.

When you define a task role you may want to create your own policy in order to have fine grained control over what services this role can access; we've done that in the CFT that contains all of these resources. It may be that you want to attached existing managed policies to your task role, and this works as well. Conversely you can do a combination of both and to attach managed policies to the role and define your own custom policies as needed.

Our application will not need much access to other AWS resource so we'll just be giving it logs and cloudwatch access.

ECS Task Role
ECS Task Role
ECS Autoscaling Role

The Autoscaling Role is assigned to a scalable target, this is another AWS service that we will define in one of the upcoming tutorials but for now what's important to know is that this service will ensure that we have the correct number of containers running at all times. If our application has a high memory usage the scalable target will increase the number of containers based off of a predefined rule, similarly, if we have low CPU usage it will reduce the number of containers.

This is another role where we're just going to use the existing AWS managed policy, there isn't a lot to this policy, the full policy just allows it to update services and to update metric alarms that will be used to trigger a service update.

ECS Auto Scaling Role
ECS Auto Scaling Role

S3 Buckets

We're going to deploy two S3 buckets to serve different purposes:

  • Code build resources: This will hold all resources related to code builds, such as the CFTs when deployed through SAM or Code Build. This bucket is useful in that you can put all of your deployment artifacts in a central location without having a new bucket created for each stack / deployment.
  • Log output: All output logs will be saved in S3 in a centralized location.
  • Static files: This will hold all of our static files (e.g. images, .js, etc.)

I am forcing the code build bucket into this CFT for the sake of having less CFTs to deploy for this tutorial. I'd reccomend you taking a look separating this out into a separate, potentially more global, stack that works for you.

Looking at the resource definitions for the bucket, the code build bucket is on top and the static files bucket is on the bottom. There is a very important difference between the two: the code build bucket has blocked all public access and has enabled AES 256 encryption on all items by default. We cannot apply this same set of rules to the static files bucket since we intend to make them available for users to consume through CloudFront but you should always, always, always enable at least this same level of encryption on all of your buckets and set all of the values under PublicAccessBlockConfiguration to true to prevent public access where possible.

S3 Bucket Definitions
S3 Bucket Definitions

As discussed above, we're going to make the static files available to CloudFront. Since CloudFront will not have the same host as our S3 bucket we need to enable Cross Orgin Resource Sharing (CORS) for the static files bucket. The bucket above allows HTTP methods GET and HEAD from all (*) origins, or hosts. This must be enabled to allows CloudFront to access resources in your bucket. In addition, all of the values in PublicAccessBlockConfiguration are set to false because we want to allow external access.

Setting up CloudFront

I mentioned above that setting up a CloudFront resource via CFT is simple. It only requires two resources to be created:

  • An Origin Access Identity: this is a special CloudFront user that enables secure acess to S3 resources
  • The CloudFront distribution with configured caching: Tells CloudFront what S3 bucket to use to access files and how to cache results
Origin Access Identity

This is a generic representation of a special CloudFront user, the definition is simply:

CloudFront Origin Access Identity
CloudFront Origin Access Identity
The CloudFront Distribution

Alright, now admittedly this does take a moment to sink in but hopefully after we go through the definition it will not appear as complex. We are really only defining two key items in the CloudFront distribution.

  • A source where CloudFront will get static files from when they are not in the CloudFront cache
  • How to cache requests to CloudFront

The full definition is:

CloudFront Origin Access Identity
CloudFront Distribution Resouce

Breaking this down into manageable chunks, here are the meanings and reason why we have each value within the DistributionConfig:

  • Origins: This is a list of different origins that may hold static files, for us we're only using a single S3 bucket but this could be another server if your static files are not on S3. We are using the following paramets for our S3 origin:
    • Domain name: The S3 bucket where static files should be retrieved from, we are referencing the bucket that we created above so that AWS CFT will automatically fill in this value for us when we deploy these resources.
    • ID: Just a unique ID that references this origin, in the event you have multiple origins you can specify different caching behaviour based off of the ID.
    • S3OriginConfig: The value here needs to resolve to a string that represents the fully qualified Origin Access Identity. As far as I know (and I could be wrong), you cannot use !Ref to get this fully qualified string alone so we need to build the entire thing. We do reference our Origin Access Identity that has already been defined to populate the last value of the string.
  • Enabled: Whether or not this CloudFront distribution should be enabled
  • Logging: Holds access logs when users request files through the CloudFront distribution, this may be useful for debugging or when trying to understand how malicious attacks occur.
    • IncludeCookies: We've set it to false since we do not need to have cookies stored
    • Bucket: Once again we reference the bucket to save the output to but we build a fully qualified string in order to meet the format requirements for this field. Logging output goes to a log bucket.
    • Prefix: This is the directory within the bucket where the logs will be stored.
  • PriceClass: The price class that you choose will impact the cost, but the implicatons here are more about where you which CloudFront edge locations will serve your static files. You pay more, you get more edge locations. Easy as that. Price Class 100 is the cheapest but it has really good coverage including the US, Canada, Europe and Isreal. The CloudFront pricing has a more in-depth breakdown.
  • ViewerCertificate: When we reference our CloudFront distribution we will be using the domain name provided to us by CloudFront. The documentation says to just set CloudFrontDefaultCertificate to true in this scenaro, but if you're using it as an alias or a CNAME from your domain provider then you can use CloudFront to do the HTTPS verificaiton with your website certificate.
  • DefaultCacheBehavior: this is how we will cache our static files and to enable CloudFront to really speed up the customer experience when the same file is accessed more than once.
    • AllowedMethods: We're only going to allow the basic options to retrieve, for our use-case all PUTs and POSTS will be routed to the Django application.
    • Compress: Whether or not CloudFront should automatically try to compress files, all modern browsers can natively support compressed files and this can save a lot of time when transferring files across the wire
    • DefaultTTL: How long each item should remain in the cache, in milliseconds
    • ForwardedValues: This is a set of rules that tells CloudFront what values to forward to the origin when resolving the request. If you're serving images or SVGs you will likely need to pass along additional headers so that S3 can verify and resolve your request. Full disclaimer - I recently saw that this field is deprecated by AWS, they now reccomend creating Orgin Request Polciies or Cache Policies instead. After a quick search I didn't see how to create these via CFT so I'll likely update this article later after a further review.

      Make sure you double check the header values that you enable especially if you enable POST requests through CloudFront, CORS can be exploited by malicous actors. CORS is outside the scope of this artcile but you should thoroughly research this before using any values, even the ones in this tutorial.

    • MinTTL & MaxTTL: The minumum and maximum amount of time items should remain in the cache. Objects can be given specific cache lengths but we're not going to go to that level of depth.
    • TargetOriginId: The ID of the origin defined above
    • ViewerProtocolPolicy: This field essentially determines if requests should be redirected from HTTP to HTTPS, if only HTTPS requests should be allowed or if we're going to allow both. Since we're only allowing GET, HEAD and OPTIONS requests we are fine to allow HTTP and HTTPS access.

Not so bad, right? It's a little verbose but not nearly as bad as some of the other AWS resources. Now let's take this one step further. One of the main benefits of using CloudFront is that it prevents us having to make a GET request to access our objects in S3 which can be costly, especially if we are making a whole bunch of requests to S3. What we can do now is to apply a policy that blocks all requests to S3 that are not from our CloudFront Idendtity, our account's root user or the user that we defined in the parameters. Strictly speaking, this isn't absoluitely required, but it will prevent users from being able to make these requests directly to our bucket by going around CloudFront.

S3 Access Policy
S3 Access Policy

One downside to this is that AWS' Principal and NotPrincipal do not allow groups, which would be an amazingly powerful capabilitiy. Therefore we are explicitly granting access to the root user and the user defined in the parameters.

Deploying With SAM

To deploy this, clone the repo and in the root directory run the following snippet in your terminal, making sure to replace your-user-id with your actual AWS user ID:

  sam deploy \
  --template templates/ecs-cluster-template.yaml \
  --stack-name my-cluster-resources \
  --parameter-overrides "ParameterKey=YourUserId,ParameterValue=your-user-id"

Closing thoughts

We now have the boiler plate resources out of the way, the networking, roles, access, buckets, etc. are all in place. While the deployment of these resources by themselves may seem odd, once you deploy your second, third and fourth application to ECS you will hopefully start to gleam some reuse from this set of resources and be able to narrow down the scope of your application CFTs to just have the resources needed to deploy your containers. In the next article we will create the Djagno application with an asynchronous Celery task and we'll start to tie together some of the pieces from this article as well as the one before this.

Share this article:

Like this article? Get great articles direct to your inbox