Hosting a Static Site on Private S3 + CloudFront

TL;DR: Drop your dist/ (or out/, or _site/, or whatever your generator spits out) into a private S3 bucket, put CloudFront in front, use Origin Access Control so nothing else can read the bucket, and lean on a two-pass cache-control strategy so JS/CSS bust immutably while HTML always re-validates. A single deploy.sh does the whole thing.

This is what serves this site right now. The same pattern powers a small stable of other sites I look after — same shape, different bucket names.

Why this shape

The two common ways to serve static HTML on AWS:

Public S3 static-website hosting. Simple. Zero moving parts. But the bucket is world-readable, HTTPS terminates outside AWS, and you're stuck with S3's website endpoint quirks (no default sub-path handling, no compression, no cheap edge caching).
Private S3 + CloudFront. One extra service, but you get: TLS certificates managed by ACM, real edge caching in ~450 PoPs, gzip/brotli for free, custom error responses, and a bucket that nobody can pull directly.

For anything past a throwaway prototype, path 2 wins. It's a handful of Terraform lines you write once.

The architecture

flowchart LR
    U["🌐 Browser"]

    subgraph aws["AWS"]
        direction LR
        CF["CloudFront distribution\n(edge cache, TLS, gzip/brotli)"]
        OAC["Origin Access Control\n(signs SigV4)"]
        S3["Private S3 bucket\n(bucket policy: only this distribution)"]
    end

    U -->|"HTTPS · custom domain"| CF
    CF -->|"origin request"| OAC
    OAC -->|"authenticated GET"| S3

    style U fill:#faf9f6,stroke:#0a0a0a,color:#0a0a0a
    style CF fill:#fef3e0,stroke:#a8532b,color:#1f1c17
    style OAC fill:#fdece0,stroke:#a8532b,color:#1f1c17
    style S3 fill:#efe6d0,stroke:#4a453d,color:#1f1c17
    style aws fill:#f4ede1,stroke:#d7ceb8

Only CloudFront can read the bucket. It signs each origin request via SigV4 using its own IAM identity; the bucket policy pins the allowed source to your specific CloudFront distribution ARN.
The browser never sees S3 URLs. Everything comes through CloudFront under your custom domain.
No s3-website-*.amazonaws.com endpoint anywhere.

The Terraform, roughly

Skipping the noise around providers and variable declarations, the load-bearing bits:

resource "aws_s3_bucket" "site" {
  bucket = "example.com"
}

resource "aws_s3_bucket_public_access_block" "site" {
  bucket                  = aws_s3_bucket.site.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_cloudfront_origin_access_control" "site" {
  name                              = "example-com-oac"
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

resource "aws_cloudfront_distribution" "site" {
  enabled             = true
  is_ipv6_enabled     = true
  aliases             = ["example.com", "www.example.com"]
  default_root_object = "index.html"
  price_class         = "PriceClass_100"

  origin {
    domain_name              = aws_s3_bucket.site.bucket_regional_domain_name
    origin_id                = "s3-origin"
    origin_access_control_id = aws_cloudfront_origin_access_control.site.id
  }

  default_cache_behavior {
    target_origin_id       = "s3-origin"
    viewer_protocol_policy = "redirect-to-https"
    allowed_methods        = ["GET", "HEAD"]
    cached_methods         = ["GET", "HEAD"]
    compress               = true
    cache_policy_id        = data.aws_cloudfront_cache_policy.managed_caching_optimized.id
  }

  # Rewrite pretty URLs — /about → /about.html at the edge
  # (a small CloudFront Function, ~30 lines of JS)
  # ...

  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate.site.arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2021"
  }
}

resource "aws_s3_bucket_policy" "site" {
  bucket = aws_s3_bucket.site.id
  policy = jsonencode({
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "cloudfront.amazonaws.com" }
      Action    = "s3:GetObject"
      Resource  = "${aws_s3_bucket.site.arn}/*"
      Condition = {
        StringEquals = {
          "AWS:SourceArn" = aws_cloudfront_distribution.site.arn
        }
      }
    }]
  })
}

Two things worth calling out:

OAC (Origin Access Control), not the older OAI (Origin Access Identity). OAI is deprecated for new use — OAC signs requests with SigV4 and works cleanly with SSE-KMS-encrypted buckets. Use OAC unless you have an existing OAI-based setup you don't want to migrate yet.
The URL rewrite for /about → /about.html needs a CloudFront Function (not Lambda@Edge — much cheaper for this). About 30 lines of JavaScript that runs at every edge PoP. If you skip this, most static generators' pretty URLs work only with trailing slashes or explicit .html.

The deploy: a two-pass sync

Your generator's build ends up in some dist/ (or out/, _site/, public/). The deploy script does exactly two things: sync the assets, sync the entry documents. Different cache-control on each.

#!/usr/bin/env bash
set -euo pipefail

BUCKET=$(cd infra && terraform output -raw s3_bucket_name)
CF_ID=$(cd infra && terraform output -raw cloudfront_distribution_id)
DIST=${DIST_DIR:-dist}

# Build step goes here — whatever produces the static folder.
# e.g. `pnpm build`, `hugo`, `zola build`, `bundle exec jekyll build`, ...

# Pass 1: immutable assets (JS chunks, CSS, images, fonts)
# Content-hashed filenames means aggressive 1yr cache is safe.
aws s3 sync $DIST/ s3://$BUCKET/ \
  --delete \
  --cache-control "public, max-age=31536000, immutable" \
  --exclude "*.html" --exclude "*.xml" --exclude "*.txt" \
  --region us-east-1

# Pass 2: entry documents (HTML, sitemap, robots)
# Not content-hashed → must never cache; the CDN will hold them but
# revalidate every request against the origin.
aws s3 sync $DIST/ s3://$BUCKET/ \
  --cache-control "public, max-age=0, must-revalidate" \
  --exclude "*" \
  --include "*.html" --include "*.xml" --include "*.txt" \
  --region us-east-1

# Bust CloudFront's copy of the HTML.
aws cloudfront create-invalidation \
  --distribution-id $CF_ID \
  --paths "/*"

The two-pass approach is the single most important detail. Get this wrong and you either burn users with stale HTML or invalidate too aggressively and pay CDN egress on every deploy.

Assets are content-hashed by the framework (main.a29834dd.js, styles.7be3f1c2.css), so a new build produces a new filename. The old one lives forever in the browser's cache; the new one gets fetched fresh. Immutable is safe.
HTML filenames don't change (index.html, about.html). If a client had them cached for a year, they'd never see a redesign. Must-revalidate keeps them in the CDN but forces a check.

Framework-specific note: if your generator doesn't content-hash by default (looking at you, older Jekyll setups), do the hashing yourself, or fall back to max-age=3600 on assets — long enough to matter, short enough to survive a bad deploy.

Rollback

Not glamorous, but worth wiring in:

# Before deploying, snapshot what's live.
aws s3 sync s3://$BUCKET/ .backup-live-$(date +%F)/ --quiet

# ... deploy ...

# If it went sideways, put the snapshot back.
aws s3 sync .backup-live-2026-07-01/ s3://$BUCKET/ --delete
aws cloudfront create-invalidation --distribution-id $CF_ID --paths "/*"

S3 versioning is another option, but the local snapshot is simpler and lets you eyeball the state before overwriting.

What I'd do differently on the next site

CI-driven deploys. For anything with more than one contributor, wire the sync into GitHub Actions with an OIDC-federated role. No long-lived keys, no per-machine setup.
CloudFront Functions for headers. Injecting X-Frame-Options, Content-Security-Policy, etc. at the edge is much cleaner than trying to serve them from S3 object metadata.
Versioned release prefixes. For a bigger site, staging the whole build under a releases/2026-07-01-a1b2c3/ prefix and flipping a CloudFront origin path is a safer atomic swap than sync-in-place.

For a personal site or a small marketing property — a handful of contributors, a handful of pages — the setup above is plenty. It's cheap (pennies a month), it's fast (edge caching means most requests never touch S3), and it survives a lot of neglect between deploys.