Hosting a Hugo static website on AWS with Terraform, Cloudfront, S3, and Gitlab CI ∗ Sebastiaan van Wagensveld

Table of Contents

There are lots of great tutorials on how to host a static website on AWS using S3, but most of them require you to set your S3 bucket to public. It’s easy, it works, and assuming your static website is supposed to be public, its contents are publicly available online anyway.

That was the plan, but then Trivy cracked it for having a publicly accessible bucket. I felt guilty for adding yet another security finding to my .trivyignore file – especially because moving to Cloudfront is not an exercise in convenience but in education.

I ended up eating up a day of my long weekend trying to get this to work, this blog post, and the hosting of this very website is the result of sunk cost.

It’s my way of saying if you’ve stumbled upon this, and you’re able to just use a static hosting service, or a publicly available S3 bucket.

Another thing: I use OpenTofu instead of Terraform, but if you don’t whenever I use a tofu command, you can replace it with terraform (e.g. tofu init becomes terraform init)

Getting started

First step is to create a couple of git repos (or one if you really want a monorepo). One which handles the infrastructure part, the other for the Hugo website.

After, it’s worth getting a domain name – I like to use TLD-LIST to find cheap registrars.

Last thing for this section is to create an AWS account.

Terraform

Backend

I have an S3 backend set up for my Terraform state files. There are a few other options, but if you want an S3 backend for yourself, in your AWS account create a bucket with a unique name, for the sake of this post I’ll refer to it as my-tfstate-bucket.

We need the following information for our backend.tf

bucket: the bucket for our state file. (E.g. my-tfstate-bucket)
key: the path for our state file. (E.g. infra-static-websites/terraform.tfstate)
region: the region of our bucket. (E.g. ap-southeast-2)

Create a backend.tf with the contents:

terraform {
  required_version = ">= 1.6"
  backend "s3" {
    bucket  = "my-tfstate-bucket"
    key     = "infra-static-websites/terraform.tfstate"
    region  = "ap-southeast-2"
    encrypt = true
  }
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.18"
    }
  }
}

Variables

Replace example.com with your stylish domain you got earlier.

Perhaps it’s a good idea to not define region as a variable but hardcode it in the upcoming provider.tf file. That’s because we’re using Cloudfront and ACM, which requires the us-east-1 region.

Create a variables.tf file with these defined:

variable "region" {
  type        = string
  description = "Default AWS region"
  default     = "us-east-1"
}

variable "domain" {
  type        = string
  description = "Domain for the website."
  default     = "example.com
}

variable "www_redirect" {
  type        = string
  description = "Name for the cloudfront redirect function."
  default     = "example-com-www-redirect"
}

Provider

Default tags can be changed if you want, these are just what Infracost asks for,

Create a provider.tf file like so:

provider "aws" {
  region = var.region

  default_tags {
    tags = {
      creation    = "terraform"
      repo        = "infra-static-websites"
      Service     = var.domain
      Environment = "Prod"
    }
  }
}

Locals

This is the last boring TF file before we do something interesting I swear!

Create this locals.tf file:

locals {
  www_domain   = "www.${var.domain}"
}

Main

Onto main.tf, I’ll break this down into smaller pieces.

Route53 Hosted Zone

This is where we’re going to manage our DNS. We just need something simple:

resource "aws_route53_zone" "primary" {
  name = var.domain
  lifecycle {
    prevent_destroy = true
  }
}

A brief intermission:

It’s worth saving and applying the Terraform code now, so we can point our domain to the Route 53 hosted zone. It’s fine if we don’t right now, it just mean’s the Terraform apply will fail at validating the ACM certificates.

An easy way to do this is to generate CLI credentials (it’s a good idea to remove these afterwards), add them to our terminal:

export AWS_ACCESS_KEY_ID=REPLACE_ME export AWS_SECRET_ACCESS_KEY=REPLACE_ME export AWS_DEFAULT_REGION=us-east-1

Then run: tofu init followed by tofu apply

S3 Bucket

Not to be confused with our TF state bucket. This is where the content of our website will live.

S3 Bucket: We want the actual bucket.
Versioning: (Optional) In case we want to see old versions of website content.
Server Side Encryption, Ownership Controls, Public Access Block, ACL: (Optional) This should be default but security tools like to complain if this isn’t enabled. It also helps if you’re like me and did a ton of testing on the bucket, it helps to restore it to its default state.
Website Configuration: Set’s up the website.

resource "aws_s3_bucket" "website_content" {
  bucket = var.domain
  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "website_content" {
  bucket = aws_s3_bucket.website_content.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "website_content" {
  bucket = aws_s3_bucket.website_content.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_ownership_controls" "website_content" {
  bucket = aws_s3_bucket.website_content.id
  rule {
    object_ownership = "BucketOwnerPreferred"
  }
}

resource "aws_s3_bucket_public_access_block" "website_content" {
  bucket = aws_s3_bucket.website_content.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_acl" "website_content_acl" {
  depends_on = [
    aws_s3_bucket_ownership_controls.website_content,
    aws_s3_bucket_public_access_block.website_content,
  ]
  bucket = aws_s3_bucket.website_content.id
  acl    = "private"
}

resource "aws_s3_bucket_website_configuration" "website_content" {
  bucket = aws_s3_bucket.website_content.id
  index_document {
    suffix = "index.html"
  }

  error_document {
    key = "404.html"
  }
}

ACM Certificate

We can automate the provisioning and verification of our certificate like so:

resource "aws_acm_certificate" "certificate" {
  domain_name               = var.domain
  validation_method         = "DNS"
  subject_alternative_names = [local.www_domain]
}

resource "aws_route53_record" "certificate_validation" {
  for_each = {
    for dvo in aws_acm_certificate.certificate.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }

  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.record]
  ttl             = 60
  type            = each.value.type
  zone_id         = aws_route53_zone.primary.zone_id
}

resource "aws_acm_certificate_validation" "certificate" {
  certificate_arn         = aws_acm_certificate.certificate.arn
  validation_record_fqdns = [for record in aws_route53_record.certificate_validation : record.fqdn]
}

Cloudfront Function

There are some annoyances with Cloudfront which can be solved with a Cloudfront Function.

Namely:

You can’t force redirections of URLs starting with www to the root domain.
With the S3 bucket still set to private you can only return assets and not paths.

An explanation of the second point: Without the function we’re about the set up. Going to example.com should work. Going to example.com/blog/my-first-post.html, or potentially example.com/blog/my-first-post/index.html should work. But not example.com/blog/my-first-post nor example.com/blog/my-first-post/ because it doesn’t actually point to an asset.

This has to match your file structure in S3.

Create a file cf-function.js

function handler(event) {
    var request = event.request;
    var hostHeader = request.headers.host.value;
    var domainRegex = /(?:.*\.)?([a-z0-9\-]+\.[a-z]+)$/i;
    var match = hostHeader.match(domainRegex);

    if (!match || !hostHeader.startsWith('www.')) {
        var uri = request.uri;

        if (uri.endsWith('/')) {
            request.uri += 'index.html';
        }
        else if (!uri.includes('.')) {
            request.uri += '/index.html';
        }
        return request;
    }

    var rootDomain = match[1];

    return {
        statusCode: 301,
        statusDescription: 'Moved Permanently',
        headers: {
            "location": { "value": "https://" + rootDomain + request.uri },
            "cache-control": { "value": "max-age=3600" }
        }
    };
}

Then add this to your main.tf

resource "aws_cloudfront_function" "www_redirect" {
  name    = var.www_redirect
  runtime = "cloudfront-js-1.0"
  code    = file("./cf-function.js")
  publish = true
}

Cloudfront Distribution

Here’s the big guy. First thing we want is an Origin Access control to later allow access to the S3 bucket. The next thing is the Cloudfront distribution, one thing of note is the use of a bucket_regional_domain_name instead of website_endpoint.

Another thing worth noting during testing, is it’s probably a good idea to replace the default_cache_behavior with the following. It disables caching.

  default_cache_behavior {
    allowed_methods = ["GET", "HEAD"]
    cached_methods  = ["GET", "HEAD"]
    compress        = true


    default_ttl = 86400
    min_ttl     = 0
    max_ttl     = 31536000

    function_association {
      event_type   = "viewer-request"
      function_arn = aws_cloudfront_function.www_redirect.arn
    }

    target_origin_id       = var.domain
    viewer_protocol_policy = "redirect-to-https"

    cache_policy_id        = "658327ea-f89d-4fab-a63d-7e88639e58f6"
  }

main.tf

resource "aws_cloudfront_origin_access_control" "distribution" {
  name                              = var.domain
  description                       = "${var.domain} Policy"
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

resource "aws_cloudfront_distribution" "distribution" {
  aliases             = [var.domain, local.www_domain]
  comment             = var.domain
  default_root_object = "index.html"
  enabled             = true


  default_cache_behavior {
    allowed_methods = ["GET", "HEAD"]
    cached_methods  = ["GET", "HEAD"]
    compress        = true


    default_ttl = 86400
    min_ttl     = 0
    max_ttl     = 31536000

    function_association {
      event_type   = "viewer-request"
      function_arn = aws_cloudfront_function.www_redirect.arn
    }

    target_origin_id       = var.domain
    viewer_protocol_policy = "redirect-to-https"

    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }
  }

  origin {
    domain_name              = aws_s3_bucket.website_content.bucket_regional_domain_name
    origin_access_control_id = aws_cloudfront_origin_access_control.distribution.id
    origin_id                = var.domain
  }

  http_version    = "http2and3"
  is_ipv6_enabled = true

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate.certificate.arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2021"
  }
}

S3 Bucket Policy

We’re reaching the end of the Terraform chapter, this policy allows our Cloudfront distribution, but not one else to access our S3 bucket.

data "aws_iam_policy_document" "website_content" {
  statement {
    sid     = "AllowCloudFrontServicePrincipalReadOnly"
    effect  = "Allow"
    actions = ["s3:GetObject"]
    resources = [
      aws_s3_bucket.website_content.arn,
      "${aws_s3_bucket.website_content.arn}/*"
    ]

    principals {
      type        = "Service"
      identifiers = ["cloudfront.amazonaws.com"]
    }

    condition {
      test     = "StringEquals"
      variable = "AWS:SourceArn"
      values   = [aws_cloudfront_distribution.distribution.arn]
    }
  }
}

resource "aws_s3_bucket_policy" "website_content" {
  bucket = aws_s3_bucket.website_content.id
  policy = data.aws_iam_policy_document.website_content.json
}

Route53 Records

The last part for our Terraform is to create the Route53 DNS entries. You may notice both www and root point to the same distribution. That’s where our Cloudfront Function will come in, it’ll redirect traffic for us.

resource "aws_route53_record" "root" {
  zone_id = aws_route53_zone.primary.id
  name    = var.domain
  type    = "A"

  alias {
    name                   = aws_cloudfront_distribution.distribution.domain_name
    zone_id                = aws_cloudfront_distribution.distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

resource "aws_route53_record" "www" {
  zone_id = aws_route53_zone.primary.id
  name    = local.www_domain
  type    = "A"

  alias {
    name                   = aws_cloudfront_distribution.distribution.domain_name
    zone_id                = aws_cloudfront_distribution.distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

Hugo Static Site

In our Hugo static site repo we need to set up a Hugo site.

Hugo Deploy

Hugo supports deploying to S3 via hugo deploy.

Its format will vary depending on the type of config file you have set up. For toml (replacing BUCKET_NAME and CLOUDFRONT_ID). You may want to feed cloudFrontDistributionID via the CLI. Too bad Hugo doesn’t let you do that. But don’t worry I have a cool hack for you around the corner.

[deployment]

[[deployment.matchers]]
# Cache static assets for 1 year.
pattern = "^.+\\.(js|css|svg)$"
cacheControl = "max-age=31536000, no-transform, public"
gzip = true

[[deployment.matchers]]
pattern = "^.+\\.(png|jpg|webp|woff2)$"
cacheControl = "max-age=31536000, no-transform, public"
gzip = false

[[deployment.matchers]]
# Set custom content type for /sitemap.xml
pattern = "^sitemap\\.xml$"
contentType = "application/xml"
gzip = true

[[deployment.matchers]]
pattern = "^.+\\.(html|xml|json)$"
gzip = true

[[deployment.targets]]

name = "production"
URL = "s3://BUCKET_NAME?region=us-east-1"
cloudFrontDistributionID = "CLOUDFRONT_ID"

Manual Stuff

Let’s interrupt all this automation with some good old manual configuration.

Domain NS Records

Let’s point the domain we bought to the Hosted Zone, if you applied the Hosted Zone earlier you should see it in AWS, if not go on to set up Gitlab CI, wait for the pipeline to fail and then set up the NS records like so.

Copy the NS records from the Hosted Zone into the DNS settings of the registrar we used to buy our domain.

Gitlab AWS Access

We’ll use OpenID Connect to grant access to our AWS account for Gitlab.

Go to AWS IAM, Identity providers. Click Add providers

Provider: OpenID Connect
Provider URL: https://gitlab.com
Click Get thumbprint
Audience: https://gitlab.com
Click Add provider

Now we want to create a couple of IAM Roles.

One with ReadOnly access with a trust relationship, replacing AWS_ACCOUNT_NUMBER, GITLAB_USERNAME, and GITLAB_INFRA_REPO with relevant variables.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::AWS_ACCOUNT_NUMBER:oidc-provider/gitlab.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "gitlab.com:aud": "https://gitlab.com"
                },
                "StringLike": {
                    "gitlab.com:sub": "project_path:GITLAB_USERNAME/GITLAB_INFRA_REPO*"
                }
            }
        }
    ]
}

Similarly create an Admin role with this sort of trust relationship.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::AWS_ACCOUNT_NUMBER:oidc-provider/gitlab.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "gitlab.com:aud": "https://gitlab.com",
                    "gitlab.com:sub": "project_path:GITLAB_USERNAME/GITLAB_INFRA_REPO:ref_type:branch:ref:main"
                }
            }
        }
    ]
}

Create another with access to upload to S3 replacing GITLAB_WEBSITE_REPO.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::AWS_ACCOUNT_NUMBER:oidc-provider/gitlab.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "gitlab.com:aud": "https://gitlab.com",
                    "gitlab.com:sub": "project_path:GITLAB_USERNAME/GITLAB_WEBSITE_REPO:ref_type:branch:ref:main"
                }
            }
        }
    ]
}

Gitlab Variables

In our Gitlab repos we need to need to create a few variables.

Infrastructure Repo

ADMIN_ROLE_ARN: Arn for the admin role we made.

READONLY_ROLE_ARN: Arn for the readonly role we made.

It’s a good idea to mask both of these, and to protect the admin role.

Static Website Repo

ROLE_ARN: Arn for the S3 upload role we made.

Gitlab CI

Final stretch let’s create the CI.

Infrastructure

In our infra repo create the following .gitlab-ci.yml

.assume_role: &assume_role
    - >
      STS=($(aws sts assume-role-with-web-identity
      --role-arn ${ROLE_ARN}
      --role-session-name "GitLabRunner-${CI_PROJECT_ID}-${CI_PIPELINE_ID}"
      --web-identity-token ${ID_TOKEN}
      --duration-seconds 3600
      --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
      --output text))
    - export AWS_ACCESS_KEY_ID="${STS[0]}"
    - export AWS_SECRET_ACCESS_KEY="${STS[1]}"
    - export AWS_SESSION_TOKEN="${STS[2]}"

variables:
  CI_VERSION: "1.0.${CI_PIPELINE_IID}"

stages:
  - tf-plan
  - tf-apply

opentofu-plan:
  stage: tf-plan
  image: registry.gitlab.com/wagensveld/ci-images/aws_build_tofu:latest
  id_tokens:
    ID_TOKEN:
      aud: https://gitlab.com
  variables:
    ROLE_ARN: $READONLY_ROLE_ARN
  script:
    - *assume_role
    - tofu init
    - tofu plan
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'

opentofu-apply:
  stage: tf-apply
  image: registry.gitlab.com/wagensveld/ci-images/aws_build_tofu:latest
  id_tokens:
    ID_TOKEN:
      aud: https://gitlab.com
  variables:
    ROLE_ARN: $ADMIN_ROLE_ARN
  script:
    - *assume_role
    - tofu init
    - tofu apply -auto-approve
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'

Static Website

In our static website repo create the following .gitlab-ci.yml.

stages:
  - build
  - deploy

variables:
  GIT_SUBMODULE_FORCE_HTTPS: "true"
  GIT_SUBMODULE_STRATEGY: recursive
  GIT_DEPTH: 0
  GIT_STRATEGY: clone

build:
  stage: build
  image:
    name: registry.gitlab.com/wagensveld/ci-images/build_hugo:latest
  artifacts:
    paths:
      - public/
  script:
    - hugo
deploy_s3:
  stage: deploy
  image:
    name: registry.gitlab.com/wagensveld/ci-images/build_hugo:latest
  id_tokens:
    GITLAB_OIDC_TOKEN:
      aud: https://gitlab.com
  dependencies:
    - build
  script:
    - >
      export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s"
      $(aws sts assume-role-with-web-identity
      --role-arn ${ROLE_ARN}
      --role-session-name "GitLabRunner-${CI_PROJECT_ID}-${CI_PIPELINE_ID}"
      --web-identity-token ${GITLAB_OIDC_TOKEN}
      --duration-seconds 3600
      --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
      --output text))
    - hugo deploy
  only:
    - main

opentofu-apply:
  stage: tf-apply
  image: registry.gitlab.com/wagensveld/ci-images/aws_build_tofu:latest
  id_tokens:
    ID_TOKEN:
      aud: https://gitlab.com
  variables:
    ROLE_ARN: $ADMIN_ROLE_ARN
  parallel:
    matrix:
      - TF_VARS:
        - example_com
        - example2_com
  script:
    - *assume_role
    - tofu init -backend-config="key=infra-static-websites/${TF_VARS}/terraform.tfstate"
    - tofu apply -auto-approve -var-file="tfvars/${TF_VARS}.tfvars"
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'

Alright I’m done until some pre-commit tells me I can’t post this blog post.

Hosting a Hugo static website on AWS with Terraform, Cloudfront, S3, and Gitlab CI