I totally agree that it's a ridiculous problem! The fact that CloudFront knows about serving index.html
as Default Root Object AND STILL they say it doesn't work for subdirectories (source) is totally strange!
The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket.
I, personally, believe that AWS has made it this way so CloudFront becomes a CDN only (loading assets, with no logic in it whatsoever) and every request to a path in your website should be served from a "Server" (e.g. EC2 Node/Php server, or a Lambda function.)
Whether this limitation exists to enhance security, or keep things apart (i.e. logic and storage separated), or make more money (to enforce people to have a dedicated server, even for static content) is up to debate.
Anyhow, I'm summarizing the possible solutions workarounds here, with their pros and cons.
1) S3 can be Public - Use Custom Origin.
It's the easiest one, originally posted by @JBaczuk answer as well as in this github gist. Since S3 already supports serving index.html
in subdirectories via Static Website Hosting, all you need to do is:
- Go to S3, enable Static Website Hosting
- Grab the URL in the form of
http://<bucket-name>.s3-website-us-west-2.amazonaws.com
- Create a new Origin in CloudFront and enter this as a Custom Origin (and NOT S3 ORIGIN), so CloudFront treats this as an external website when getting the content.
Pros:
- Very easy to set up.
- It supports
/about/
, /about
, and /about/index.html
and redirect the last two to the first one, properly.
Cons:
If your files in the S3 bucket are not in the root of S3 (say in /artifacts/*
then going to www.domain.com/about
(without the trailing /
) will redirect you to www.domain.com/artifacts/about
which is something you don't want at all! Basically the /about
to /about/
redirect in S3 breaks if you serve from CloudFront and the path to files (from the root) don't match.
Security and Functionality: You cannot make S3 Private. It's because CloudFront's Origin Access Identity is not going to be supported, clearly, because CloudFront is instructed to take this Origin as a random website. It means that users can potentially get the files from S3 directly, which might not be what you ever what due to security/WAF concerns, as well as the website actually working if you have JS/html that relies on the path being your domain only.
[maybe an issue] The communication between CloudFront and S3 is not the way it's recommended to optimize stuff.
[maybe?] someone has complained that it doesn't work smoothly for more than one Origin in the Distribution (i.e. wanting /blog
to go somewhere)
[maybe?] someone has complained that it doesn't preserve the original query params as expected.
2) Official solution - Use a Lambda Function.
It's the official solution (though the doc is from 2017). There is a ready-to-launch 3rd-party Application (JavaScript source in github) and example Python Lambda function (this answer) for it, too.
Technically, by doing this, you create a mini-server (they call it serverless!) that only serves CloudFront's Origin Requests to S3 (so, it basically sits between CloudFront and S3.)
Pros:
- Hey, it's the official solution, so probably lasts longer and is the most optimized one.
- You can customize the Lambda Function if you want and have control over it. You can support further redirect in it.
- If implemented correctly, (like the 3rd party JS one, and I don't think the official one) it supports
/about/
and /about
both (with a redirect from the latter without trailing /
to the former).
Cons:
- It's one more thing to set up.
- It's one more thing to have an eye, so it doesn't break.
- It's one more thing to check when something breaks.
- It's one more thing to maintain -- e.g. the third-party one here has open PRs since Jan 2021 (it's April 2021 now.)
- The 3rd party JS solution doesn't preserve the query params. So
/about?foo=bar
is 301 redirected to /about/
and NOT /about/?foo=bar
. You need to make changes to that lambda function to make it work.
- The 3rd party JS solution keeps
/about/
as the canonical version. If you want /about
to be the canonical version (i.e. other formats get redirected to it via 301), you have to make changes to the script.
- [minor] It only works in us-east-1 (open issue in Github since 2020, still open and an actual problem in April 2021).
- [minor] It has its own cost, although given CloudFront's caching, shouldn't be significant.
3) Create fake "Folder File"s in S3 - Use a manual Script.
It's a solution between the first two -- It supports OAI (private S3) and it doesn't require a server. It's a bit nasty though!
What you do here is, you run a script that for each subdirectory of /about/index.html
it creates an object in S3 named (has key
of) /about
and copy that HTML file (the content and the content-type
) into this object.
Example scripts can be found in this Reddit answer and this answer using AWS CLI.
Pros:
- Secure: Supports S3 Private and CloudFront OAI.
- No additional live piece: The script runs pre-upload to S3 (or one-time) and then the system remains intact with the two pieces of S3 and CF only.
Cons:
- [Needs Confirmation] It supports
/about
but not /about/
with trailing /
I believe.
- Technically you have two different files being stored. Might look confusing and make your deploys expensive if there are tons of HTML files.
- Your script has to manually find all the subdirectories and create a dummy object out of them in S3. That has the potential to break in the future.
PS. Other Tricks)
Dirty trick using Javascript on Custom Error
While it doesn't look like a real thing, this answer deserves some credit, IMO!
You let the Access Denied (404s turning into 403) go through, then catch them, and manually, via a JS, redirect them to the right place.
Pros
- Again, easy to set up.
Cons
- It relies on JavaScript in Client-Side.
- It messes up with SEO -- especially if the crawler doesn't run JS.
- It messes up with the user's browser history. (i.e. back button) and possibly could be improved (and get more complicated!) via HTML5
history.replace
.