Gotchas of deploying to ECR from github actions

Today I got my team’s github repo at work to automatically build a docker image for day-to-day research, and push it to AWS ECR, using a github action.

It took me a couple of hours, and I want to share the things that took the longest. All of the information exists elsewhere, and in great multiplicity, but the short list of what goes wrong might help you cut two hours.

on: manual dispatch

Before making relevant pushes trigger the build, I decided on using manual dispatch.

It then took me over 15 minutes to perform the first build.

Why? Because I missed the fine print that says that the button to run a manual dispatch action only shows if the workflow is in the default branch!

Sure, it’s not really fine print, it’s just normal print. But I still missed it and wasted time looking through issues to find a solution. So:

Tip 1. Use manual dispatch at the beginning for testing – but remember to use the default branch

Once I got the action actually running, I ran into the next set of problems.

Permissions: Roles & Policies

You have to do ECR login. This is written everywhere.

You also have to create a role and give it permissions for certain actions in a policy. This is also written enough times, including on the github page of Amazon’s ecr-login action.

But it still took me a long time to understand that I should have two (!) Sids – one for getting a token, and one for everything needed for pushing the ECR image.

The major reason it took me a long time is that I kept experimenting with policies, with every time my docker image being built, and only after 7 minutes was the ECR push rejected. Thus:

Tip 2. Experiment with policies using a one-line Dockerfile until you have it working

Getting the policy working has (most likely) nothing to do with the content of your dockerfile, so you can separate the work on them.

And then there’s:

Tip 3. You need two Sids in the policy. Something like this:

{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Sid": "GetAuthorizationToken",
        "Effect": "Allow",
        "Action": [
            "ecr:GetAuthorizationToken"
        ],
        "Resource": [
            "*"
        ]
      },
      {
         "Sid": "AllowPush",
         "Effect": "Allow",
         "Action": [
            "ecr:GetDownloadUrlForLayer",
            "ecr:BatchGetImage",
            "ecr:BatchCheckLayerAvailability",
            "ecr:PutImage",
            "ecr:InitiateLayerUpload",
            "ecr:UploadLayerPart",
            "ecr:CompleteLayerUpload"
         ],
         "Resource": "arn:aws:ecr:us-east-2:<<account-id>>:repository/<<repository name>>/*"
      }
    ]
  }

Take a close look at the resources allowed. These took me a while to figure out.

First:

Tip 4. You (probably) should have an asterisk in the GetAuthorizarionToken resource

I’m not an expert on this, so maybe there is a more precise resource to put that is more secure. But the error the github runner kept giving me was that I need access to “*”.

Second, the actions relevant to ECR can and should do with a more precise resource path.

But! If you’re pushing multiple images, as I am, into repositories with the same prefix, then:

Tip 5. Make sure the resource has an asterisk if it needs one

That’s it, these were the five points that took me the most time to figure out.

The whole process still took only about 2 hours, and I’m happy with the result.