Hosting a Hugo static website on AWS with Terraform, Cloudfront, S3, and Gitlab CI
I know, I know this isn’t the most unique use of AWS but trust me this has some redeeming qualities.
Table of Contents
There are lots of great tutorials on how to host a static website on AWS using S3, but most of them require you to set your S3 bucket to public. It’s easy, it works, and assuming your static website is supposed to be public, its contents are publicly available online anyway.
That was the plan, but then Trivy cracked it for having a publicly accessible bucket. I felt guilty for adding yet another security finding to my .trivyignore
file – especially because moving to Cloudfront is not an exercise in convenience but in education.
I ended up eating up a day of my long weekend trying to get this to work, this blog post, and the hosting of this very website is the result of sunk cost.
It’s my way of saying if you’ve stumbled upon this, and you’re able to just use a static hosting service, or a publicly available S3 bucket.
Another thing: I use OpenTofu instead of Terraform, but if you don’t whenever I use a
tofu
command, you can replace it withterraform
(e.g.tofu init
becomesterraform init
)
Getting started
First step is to create a couple of git repos (or one if you really want a monorepo). One which handles the infrastructure part, the other for the Hugo website.
After, it’s worth getting a domain name – I like to use TLD-LIST to find cheap registrars.
Last thing for this section is to create an AWS account.
Terraform
Backend
I have an S3 backend set up for my Terraform state files. There are a few other options, but if you want an S3 backend for yourself, in your AWS account create a bucket with a unique name, for the sake of this post I’ll refer to it as my-tfstate-bucket
.
We need the following information for our backend.tf
- bucket: the bucket for our state file. (E.g.
my-tfstate-bucket
) - key: the path for our state file. (E.g.
infra-static-websites/terraform.tfstate
) - region: the region of our bucket. (E.g.
ap-southeast-2
)
Create a backend.tf
with the contents:
terraform {
required_version = ">= 1.6"
backend "s3" {
bucket = "my-tfstate-bucket"
key = "infra-static-websites/terraform.tfstate"
region = "ap-southeast-2"
encrypt = true
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.18"
}
}
}
Variables
Replace example.com
with your stylish domain you got earlier.
Perhaps it’s a good idea to not define region
as a variable but hardcode it in the upcoming provider.tf
file. That’s because we’re using Cloudfront and ACM, which requires the us-east-1
region.
Create a variables.tf
file with these defined:
variable "region" {
type = string
description = "Default AWS region"
default = "us-east-1"
}
variable "domain" {
type = string
description = "Domain for the website."
default = "example.com
}
variable "www_redirect" {
type = string
description = "Name for the cloudfront redirect function."
default = "example-com-www-redirect"
}
Provider
Default tags can be changed if you want, these are just what Infracost asks for,
Create a provider.tf
file like so:
provider "aws" {
region = var.region
default_tags {
tags = {
creation = "terraform"
repo = "infra-static-websites"
Service = var.domain
Environment = "Prod"
}
}
}
Locals
This is the last boring TF file before we do something interesting I swear!
Create this locals.tf
file:
locals {
www_domain = "www.${var.domain}"
}
Main
Onto main.tf
, I’ll break this down into smaller pieces.
Route53 Hosted Zone
This is where we’re going to manage our DNS. We just need something simple:
resource "aws_route53_zone" "primary" {
name = var.domain
lifecycle {
prevent_destroy = true
}
}
A brief intermission:
It’s worth saving and applying the Terraform code now, so we can point our domain to the Route 53 hosted zone. It’s fine if we don’t right now, it just mean’s the Terraform apply will fail at validating the ACM certificates.
An easy way to do this is to generate CLI credentials (it’s a good idea to remove these afterwards), add them to our terminal:
export AWS_ACCESS_KEY_ID=REPLACE_ME
export AWS_SECRET_ACCESS_KEY=REPLACE_ME
export AWS_DEFAULT_REGION=us-east-1
Then run:
tofu init
followed bytofu apply
S3 Bucket
Not to be confused with our TF state bucket. This is where the content of our website will live.
- S3 Bucket: We want the actual bucket.
- Versioning: (Optional) In case we want to see old versions of website content.
- Server Side Encryption, Ownership Controls, Public Access Block, ACL: (Optional) This should be default but security tools like to complain if this isn’t enabled. It also helps if you’re like me and did a ton of testing on the bucket, it helps to restore it to its default state.
- Website Configuration: Set’s up the website.
resource "aws_s3_bucket" "website_content" {
bucket = var.domain
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "website_content" {
bucket = aws_s3_bucket.website_content.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "website_content" {
bucket = aws_s3_bucket.website_content.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_ownership_controls" "website_content" {
bucket = aws_s3_bucket.website_content.id
rule {
object_ownership = "BucketOwnerPreferred"
}
}
resource "aws_s3_bucket_public_access_block" "website_content" {
bucket = aws_s3_bucket.website_content.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_acl" "website_content_acl" {
depends_on = [
aws_s3_bucket_ownership_controls.website_content,
aws_s3_bucket_public_access_block.website_content,
]
bucket = aws_s3_bucket.website_content.id
acl = "private"
}
resource "aws_s3_bucket_website_configuration" "website_content" {
bucket = aws_s3_bucket.website_content.id
index_document {
suffix = "index.html"
}
error_document {
key = "404.html"
}
}
ACM Certificate
We can automate the provisioning and verification of our certificate like so:
resource "aws_acm_certificate" "certificate" {
domain_name = var.domain
validation_method = "DNS"
subject_alternative_names = [local.www_domain]
}
resource "aws_route53_record" "certificate_validation" {
for_each = {
for dvo in aws_acm_certificate.certificate.domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
}
allow_overwrite = true
name = each.value.name
records = [each.value.record]
ttl = 60
type = each.value.type
zone_id = aws_route53_zone.primary.zone_id
}
resource "aws_acm_certificate_validation" "certificate" {
certificate_arn = aws_acm_certificate.certificate.arn
validation_record_fqdns = [for record in aws_route53_record.certificate_validation : record.fqdn]
}
Cloudfront Function
There are some annoyances with Cloudfront which can be solved with a Cloudfront Function.
Namely:
- You can’t force redirections of URLs starting with
www
to the root domain. - With the S3 bucket still set to private you can only return assets and not paths.
An explanation of the second point: Without the function we’re about the set up. Going to example.com
should work. Going to example.com/blog/my-first-post.html
, or potentially example.com/blog/my-first-post/index.html
should work. But not example.com/blog/my-first-post
nor example.com/blog/my-first-post/
because it doesn’t actually point to an asset.
This has to match your file structure in S3.
Create a file cf-function.js
function handler(event) {
var request = event.request;
var hostHeader = request.headers.host.value;
var domainRegex = /(?:.*\.)?([a-z0-9\-]+\.[a-z]+)$/i;
var match = hostHeader.match(domainRegex);
if (!match || !hostHeader.startsWith('www.')) {
var uri = request.uri;
if (uri.endsWith('/')) {
request.uri += 'index.html';
}
else if (!uri.includes('.')) {
request.uri += '/index.html';
}
return request;
}
var rootDomain = match[1];
return {
statusCode: 301,
statusDescription: 'Moved Permanently',
headers: {
"location": { "value": "https://" + rootDomain + request.uri },
"cache-control": { "value": "max-age=3600" }
}
};
}
Then add this to your main.tf
resource "aws_cloudfront_function" "www_redirect" {
name = var.www_redirect
runtime = "cloudfront-js-1.0"
code = file("./cf-function.js")
publish = true
}
Cloudfront Distribution
Here’s the big guy. First thing we want is an Origin Access control to later allow access to the S3 bucket. The next thing is the Cloudfront distribution, one thing of note is the use of a bucket_regional_domain_name
instead of website_endpoint
.
Another thing worth noting during testing, is it’s probably a good idea to replace the default_cache_behavior
with the following. It disables caching.
default_cache_behavior {
allowed_methods = ["GET", "HEAD"]
cached_methods = ["GET", "HEAD"]
compress = true
default_ttl = 86400
min_ttl = 0
max_ttl = 31536000
function_association {
event_type = "viewer-request"
function_arn = aws_cloudfront_function.www_redirect.arn
}
target_origin_id = var.domain
viewer_protocol_policy = "redirect-to-https"
cache_policy_id = "658327ea-f89d-4fab-a63d-7e88639e58f6"
}
main.tf
resource "aws_cloudfront_origin_access_control" "distribution" {
name = var.domain
description = "${var.domain} Policy"
origin_access_control_origin_type = "s3"
signing_behavior = "always"
signing_protocol = "sigv4"
}
resource "aws_cloudfront_distribution" "distribution" {
aliases = [var.domain, local.www_domain]
comment = var.domain
default_root_object = "index.html"
enabled = true
default_cache_behavior {
allowed_methods = ["GET", "HEAD"]
cached_methods = ["GET", "HEAD"]
compress = true
default_ttl = 86400
min_ttl = 0
max_ttl = 31536000
function_association {
event_type = "viewer-request"
function_arn = aws_cloudfront_function.www_redirect.arn
}
target_origin_id = var.domain
viewer_protocol_policy = "redirect-to-https"
forwarded_values {
query_string = false
cookies {
forward = "none"
}
}
}
origin {
domain_name = aws_s3_bucket.website_content.bucket_regional_domain_name
origin_access_control_id = aws_cloudfront_origin_access_control.distribution.id
origin_id = var.domain
}
http_version = "http2and3"
is_ipv6_enabled = true
restrictions {
geo_restriction {
restriction_type = "none"
}
}
viewer_certificate {
acm_certificate_arn = aws_acm_certificate.certificate.arn
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1.2_2021"
}
}
S3 Bucket Policy
We’re reaching the end of the Terraform chapter, this policy allows our Cloudfront distribution, but not one else to access our S3 bucket.
data "aws_iam_policy_document" "website_content" {
statement {
sid = "AllowCloudFrontServicePrincipalReadOnly"
effect = "Allow"
actions = ["s3:GetObject"]
resources = [
aws_s3_bucket.website_content.arn,
"${aws_s3_bucket.website_content.arn}/*"
]
principals {
type = "Service"
identifiers = ["cloudfront.amazonaws.com"]
}
condition {
test = "StringEquals"
variable = "AWS:SourceArn"
values = [aws_cloudfront_distribution.distribution.arn]
}
}
}
resource "aws_s3_bucket_policy" "website_content" {
bucket = aws_s3_bucket.website_content.id
policy = data.aws_iam_policy_document.website_content.json
}
Route53 Records
The last part for our Terraform is to create the Route53 DNS entries. You may notice both www
and root point to the same distribution. That’s where our Cloudfront Function will come in, it’ll redirect traffic for us.
resource "aws_route53_record" "root" {
zone_id = aws_route53_zone.primary.id
name = var.domain
type = "A"
alias {
name = aws_cloudfront_distribution.distribution.domain_name
zone_id = aws_cloudfront_distribution.distribution.hosted_zone_id
evaluate_target_health = false
}
}
resource "aws_route53_record" "www" {
zone_id = aws_route53_zone.primary.id
name = local.www_domain
type = "A"
alias {
name = aws_cloudfront_distribution.distribution.domain_name
zone_id = aws_cloudfront_distribution.distribution.hosted_zone_id
evaluate_target_health = false
}
}
Hugo Static Site
In our Hugo static site repo we need to set up a Hugo site.
Hugo Deploy
Hugo supports deploying to S3 via hugo deploy.
Its format will vary depending on the type of config file you have set up. For toml
(replacing BUCKET_NAME
and CLOUDFRONT_ID
). You may want to feed cloudFrontDistributionID
via the CLI. Too bad Hugo doesn’t let you do that. But don’t worry I have a cool hack for you around the corner.
[deployment]
[[deployment.matchers]]
# Cache static assets for 1 year.
pattern = "^.+\\.(js|css|svg)$"
cacheControl = "max-age=31536000, no-transform, public"
gzip = true
[[deployment.matchers]]
pattern = "^.+\\.(png|jpg|webp|woff2)$"
cacheControl = "max-age=31536000, no-transform, public"
gzip = false
[[deployment.matchers]]
# Set custom content type for /sitemap.xml
pattern = "^sitemap\\.xml$"
contentType = "application/xml"
gzip = true
[[deployment.matchers]]
pattern = "^.+\\.(html|xml|json)$"
gzip = true
[[deployment.targets]]
name = "production"
URL = "s3://BUCKET_NAME?region=us-east-1"
cloudFrontDistributionID = "CLOUDFRONT_ID"
Manual Stuff
Let’s interrupt all this automation with some good old manual configuration.
Domain NS Records
Let’s point the domain we bought to the Hosted Zone, if you applied the Hosted Zone earlier you should see it in AWS, if not go on to set up Gitlab CI, wait for the pipeline to fail and then set up the NS records like so.
Copy the NS
records from the Hosted Zone into the DNS settings of the registrar we used to buy our domain.
Gitlab AWS Access
We’ll use OpenID Connect to grant access to our AWS account for Gitlab.
Go to AWS IAM, Identity providers. Click Add providers
- Provider: OpenID Connect
- Provider URL: https://gitlab.com
- Click
Get thumbprint
- Audience: https://gitlab.com
- Click
Add provider
Now we want to create a couple of IAM Roles.
One with ReadOnly access with a trust relationship, replacing AWS_ACCOUNT_NUMBER
, GITLAB_USERNAME
, and GITLAB_INFRA_REPO
with relevant variables.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::AWS_ACCOUNT_NUMBER:oidc-provider/gitlab.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"gitlab.com:aud": "https://gitlab.com"
},
"StringLike": {
"gitlab.com:sub": "project_path:GITLAB_USERNAME/GITLAB_INFRA_REPO*"
}
}
}
]
}
Similarly create an Admin role with this sort of trust relationship.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::AWS_ACCOUNT_NUMBER:oidc-provider/gitlab.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"gitlab.com:aud": "https://gitlab.com",
"gitlab.com:sub": "project_path:GITLAB_USERNAME/GITLAB_INFRA_REPO:ref_type:branch:ref:main"
}
}
}
]
}
Create another with access to upload to S3 replacing GITLAB_WEBSITE_REPO
.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::AWS_ACCOUNT_NUMBER:oidc-provider/gitlab.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"gitlab.com:aud": "https://gitlab.com",
"gitlab.com:sub": "project_path:GITLAB_USERNAME/GITLAB_WEBSITE_REPO:ref_type:branch:ref:main"
}
}
}
]
}
Gitlab Variables
In our Gitlab repos we need to need to create a few variables.
Infrastructure Repo
ADMIN_ROLE_ARN
: Arn for the admin role we made.
READONLY_ROLE_ARN
: Arn for the readonly role we made.
It’s a good idea to mask both of these, and to protect the admin role.
Static Website Repo
ROLE_ARN
: Arn for the S3 upload role we made.
Gitlab CI
Final stretch let’s create the CI.
Infrastructure
In our infra repo create the following .gitlab-ci.yml
.assume_role: &assume_role
- >
STS=($(aws sts assume-role-with-web-identity
--role-arn ${ROLE_ARN}
--role-session-name "GitLabRunner-${CI_PROJECT_ID}-${CI_PIPELINE_ID}"
--web-identity-token ${ID_TOKEN}
--duration-seconds 3600
--query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
--output text))
- export AWS_ACCESS_KEY_ID="${STS[0]}"
- export AWS_SECRET_ACCESS_KEY="${STS[1]}"
- export AWS_SESSION_TOKEN="${STS[2]}"
variables:
CI_VERSION: "1.0.${CI_PIPELINE_IID}"
stages:
- tf-plan
- tf-apply
opentofu-plan:
stage: tf-plan
image: registry.gitlab.com/wagensveld/ci-images/aws_build_tofu:latest
id_tokens:
ID_TOKEN:
aud: https://gitlab.com
variables:
ROLE_ARN: $READONLY_ROLE_ARN
script:
- *assume_role
- tofu init
- tofu plan
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
opentofu-apply:
stage: tf-apply
image: registry.gitlab.com/wagensveld/ci-images/aws_build_tofu:latest
id_tokens:
ID_TOKEN:
aud: https://gitlab.com
variables:
ROLE_ARN: $ADMIN_ROLE_ARN
script:
- *assume_role
- tofu init
- tofu apply -auto-approve
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
Static Website
In our static website repo create the following .gitlab-ci.yml
.
stages:
- build
- deploy
variables:
GIT_SUBMODULE_FORCE_HTTPS: "true"
GIT_SUBMODULE_STRATEGY: recursive
GIT_DEPTH: 0
GIT_STRATEGY: clone
build:
stage: build
image:
name: registry.gitlab.com/wagensveld/ci-images/build_hugo:latest
artifacts:
paths:
- public/
script:
- hugo
deploy_s3:
stage: deploy
image:
name: registry.gitlab.com/wagensveld/ci-images/build_hugo:latest
id_tokens:
GITLAB_OIDC_TOKEN:
aud: https://gitlab.com
dependencies:
- build
script:
- >
export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s"
$(aws sts assume-role-with-web-identity
--role-arn ${ROLE_ARN}
--role-session-name "GitLabRunner-${CI_PROJECT_ID}-${CI_PIPELINE_ID}"
--web-identity-token ${GITLAB_OIDC_TOKEN}
--duration-seconds 3600
--query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]'
--output text))
- hugo deploy
only:
- main
Hacks
There are a couple of hacky workarounds which can improve your deployment.
Hiding the CF Distribution on deploy
You can add echo 'cloudFrontDistributionID = "$CF_CDN" >> config.toml'
to the CI, with CF_CDN
as a variable. Really not ideal but I’m tired, and want to stop writing.
Multiple TF states
If you want to deploy multiple environments, you can remove the key
from your backend, create tfvars
files for each of your websites, and do something like this:
opentofu-apply:
stage: tf-apply
image: registry.gitlab.com/wagensveld/ci-images/aws_build_tofu:latest
id_tokens:
ID_TOKEN:
aud: https://gitlab.com
variables:
ROLE_ARN: $ADMIN_ROLE_ARN
parallel:
matrix:
- TF_VARS:
- example_com
- example2_com
script:
- *assume_role
- tofu init -backend-config="key=infra-static-websites/${TF_VARS}/terraform.tfstate"
- tofu apply -auto-approve -var-file="tfvars/${TF_VARS}.tfvars"
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
Alright I’m done until some pre-commit tells me I can’t post this blog post.