Hosting Static React Websites on AWS S3 (& CloudFront) with SSL

AWS S3 Static Website Hosting. It is cheap, scalable, and “performant”. Especially when it tag team with CloudFront.

This is a documentation of how to host a Single Page Application (React for this case) on AWS S3 with SSL over CloudFront using this pet project of mine as an example.

1) The project

A simple static site so no redux is used; this setup would also work with redux. So its gonna be react and react router mainly. Here are the specifics:

react: ^15.6.1
react-router: ^4.1.2

The bundler I am using is webpack: ^3.5.5.

2) AWS S3

S3 can host static website apart from just storage.

Note that each bucket is meant for only 1 website, that is you cannot have a bucket called my-static-websites and have each directory hosting 1 website. No. It is going to be per website per bucket.

Set up the static website hosting configuration as such for the bucket. Take note of the Endpoint.

This setup is saying:

When users visit the root path of my website, show them the file index.html.
When users visit the a page that does not exist, show them the default S3 error message on their browser.

So when we upload the react project into the bucket:

At the root path, users can see the site up and alive!!!
Users can also navigate to different paths!!!
But hitting refresh when the path is /something instead of / will show you a blank screen, or the error.html page if one was setup :((((

What is happening? Well the /something path is looking for a file something.html in the S3 bucket but it was not to be found. Since this is a Single Page Application, there is only 1 html file, 1 GOD html file.

So here is the challenge.

We need to map all paths to the index.html file.

Since this is a react project, we do not need to map each path to a specific other html page like a typical website; the index.html will load the javascript bundle and react router will get to work to show users the correct page based on the path.

Hygiene pages

Not sure if this is the correct term for sitemap.xml and robots.txt files but yea you’ll need these files for SEO. These files go into the root directory of your bucket as siblings to the index.html file. and the url to them are, eventually, https://www.yourdomain.com/robots.txt and https://www.yourdomain.com/sitemap.xml respectively.

3a) AWS CloudFront — Distribution

CloudFront is the CDN of AWS it can handle the mapping of the routes, on top of caching the site.

Start off by creating a web distribution. The key configurations I will like to mention are:

Origin Domain Name — Upon focusing on this field, there will be a dropdown listing all the buckets that you have in your AWS S3. Do NOT use any option from this list. Instead, enter the domain of the Endpoint of the static-website-hosting-enabled S3 bucket as mentioned in the previous section here.
Viewer Protocol Policy — Select Redirect HTTP to HTTPS to ensure your website is always viewed over HTTPS, and there is no duplicate instance under the HTTP protocol that is accessible by the public.
Cache Based on Selected Request Headers — Select Whitelist and add in the Origin header. This is to avoid any CORS related errors.
Alternate Domain Names (CNAMEs) — enter the non-www and the www domain name here, or any other subdomain you have may have intended, separated by a line break or comma.
SSL Certificate — Select Custom SSL Certificate and upload you own ssl certificate, along with the private key and CA bundle via Amazon Certificate Manager.
Compress Objects Automatically — Select Yes. CloudFront will automatically compress your uncompressed assets from S3 and improved your page speed, by Google standards. Exchange all the deep apache/nginx/IIS setups with just a radio button — that’s like trading a wight for a dragon.

Create the CloudFront distribution and wait for it to get deployed. Take note of the distribution’s Domain Name.

3b) AWS CloudFront — Error Pages

After creating the CloudFront distribution, while its status is In Progress, proceed to the Error Pages tab. Handle response codes 404 and 403 with Customize Error Response.

Google recommends 1 week or 604800 seconds of caching.

What we are doing here is to set up CloudFront to handle missing html pages, which typically occurs when a user enters an invalid path or, in particular, when they refresh a path other than the root path.

When that happens:

CloudFront will be looking for a file that does not exist in the S3 bucket; there is only 1 html file in the bucket and that is the index.html for the case of a Single Page Application like this project example
A 404 response will be returned and our custom error response setup will hijack it. We will return a 200 response code and the index.html page instead.
React router, which will be loaded along with the index.html file, will look at the url and render the correct page instead of the root path. This page will be cache for the duration of the TTL for all requests to the queried path.

Why do we need to handle 403 as well? It is because this response code, instead of 404, is returned by Amazon S3 for assets that are not present. For instance, a url of https://yourdomain.com/somewhere will be looking for a file called somewhere (without extension) that does not exist.

PS. It used to be returning 404, but it seems to be returning 403 now; either way it is best to handle both response codes).

4) DNS

I intend to use the www version of the domain.

Go to the DNS zone file and set up as such.

This setup indicates:

domain.com will be redirected to www.domain.com
requests will be rewritten, if valid, from http to https

I am using namecheap.com as my DNS service provider, and they come with an option to redirect https or http non-www to https www at the DNS level.

However.

If your DNS service provider does not provide this function, you can use AWS S3 to do the redirect instead. Create another bucket with these settings.

Set the value DNS A record of the root domain to the end point of this bucket.

What will be achieved is all non-www request will be directed to this bucket. This bucket will in turn redirect the request to the www domain, which points to the bucket where the files are. And yes it will be a 301 redirect. In case you are wondering, this is the significance of a 301 redirect.

Conversion of http to https will be handled by CloudFront configuration (Viewer Protocol Policy) that was setup previously.

At this point of time, you should be able to access your site like a normal website. Refreshing at a path other than the root path should also work.

All non https requests will be redirected under the https protocol.

All non www request will be redirected to the www domain under the https protocol as well.

Bots and crawlers should be able to access your robots.txt and sitemap.xml files as usual.

5) Conclusion

Pros

Financially friendly. You basically pay for what you only use, so you would not be wasting any penny on under utilized resources that comes with a monthly payment model. On top of that, this Cloudfront & S3 combination also saves you some money because it is a lot cheaper to transfer data out to the Internet via CloudFront than S3. Not to mention the better performance of a CDN.
Scalability friendly. If somehow your site gets really popular, there will not be a scalability issue due to the surge in traffic because AWS CloudFront will be taking care of that for you. There is no need for any upgrade of plans with other hosting companies.
Performance friendly. Since this whole site is sitting on top of a CDN, delivery of the site and the assets are going to be super fast.
DDoS unfriendly. Since the site is behind AWS CloudFront, attack against DDoS is, once again, handled by CloudFront. DDoS attack are guarded against by Amazon’s own technology and I will place my bets on their cyber security technology and reliability than on other hosting companies.
Security friendly — Since CloudFront is now handling the SSL configurations, you will see that the SSL tests for your domains are A grade on SSLLabs.

Cons

This works only on static sites. It will take a humongous amount of traffic to even slow down a static website substantially. Most of the bottle necks in a typical application is when it interacts with a backend that involves logic computation and database queries.
Since the site is cached on the CDN, any changes will not be seen immediately and have to wait until the cache expires. This is something that comes along with any caching mechanism. We can mitigate it by invalidating cache (which will incur charges). If your javascript file names are hashed, then you can ignore the the javascript files and just need to invalidate the index.html file. Alternatively, you can give a lower caching period for only the index.html file.
Every Single Page Application’s bug bear is the requirement for server-side rendering. Bots and crawlers are not able to get the meta data of the site because they do not allow javascript to execute, apart from Googlebot. So if your site is only concerned about SEO on Google, this setup is good to go. But if you are reliant on other search engines, or if you are marketing the site via social media like Facebook, this is not ideal. (TODO serve html pages using API Gateway and Lambda)
As all 404 and 403 responses are hijacked to return 200, you will probably not receive any 404 errors on Google Search Console (GSC), if you had indexed your website there. These 404 reports provided by GSC are useful to tell you which pages are having error and will notify you about it. Without them you will not know which pages are down or if there are any broken links linking to other parts of your website.

Side quest

In this section of the article, I will be documenting how to automate the deployment process of such a site in such a setup from just the command line.

1) AWS IAM

To start off, you will need to create an IAM user and give it the necessary S3 permissions.

Note the access key id and the secret access key, as well as the User ARN.

IAM users are access control configuration in your AWS account, principally to answer the question of who can do what to which of the services under your account.

Let’s call this user iam_user.

2) AWS S3

Change the bucket policy to allow this iam_user to make changes to the bucket.

{"Version": "2012-10-17","Id": "someID","Statement": [{"Effect": "Allow","Principal": {"AWS": "arn:aws:iam::123456789:user/iam_user"},"Action": "s3:*","Resource": "arn:aws:s3:::bucket-name"}]}

3) Deployment

As this is a simple, mostly static, website, there is no testing scripts or any CI server set up for the deployment procedure. It will just be a simple task to upload new files to the correct bucket in S3 using AWS CLI.

Cleanup

But before uploading, make sure you clean up the distribution folder where you build your files for the production environment. Since I use webpack as my bundler, I utilise the clean-webpack-plugin to help me dispose of old files before building new ones. This is to prevent uploading the same old assets again to the bucket.

# webpack.config

const CleanWebpackPlugin = require('clean-webpack-plugin')const HtmlWebpackPlugin = require('html-webpack-plugin')const pathsToClean = ["dist"]const cleanOptions = {}

...

output: {path: path.resolve(__dirname, "dist", "assets"), // all files are bundled into the dist/assets sub-directorypublicPath: '/assets/',filename: 'bundle.js'},

...

plugins: [...,new CleanWebpackPlugin(pathsToClean, cleanOptions), // cleanup the whole "dist" foldernew HtmlWebpackPlugin({template: "./src/index.production.html",filename: "../index.html" // all files are bundled into the dist/assets sub-directory, but index.html will be placed 1 directory up in the dist directory itself}),

...]

Uploading

Now to upload the files to S3.

To prevent any Tom Dick and Harry from being able to do so, authentication is required. This is where all the work for IAM comes into play.

We will use a script to do the uploading, with custom configuration to authenticate the request.

You can use --dryrun flag to test your script before actually doing the upload. This is the final version of my script.

aws s3 cp ./dist s3://better-cover-letter --recursive --exclude "*.DS_Store" --acl public-read --cache-control public,max-age=604800 --dryrun --profile iam_user

The --exclude flag is to prevent the upload of the irritating, ever present .DS_Store file in macOS.

The --acl flag will set the access control level of the files. Make it public readable so people can access your site, otherwise they will be slapped with a 403 Forbidden message.

The --cache-control flag adds the cache-control header to the S3 objects when Cloudfront calls for them. These cache control headers will be passed to the browser to leverage on browser caching and thereby increasing page speed. 604800 is 1 week in seconds, so this max-age value will cache these assets for a week.

[Google] recommend[s] a minimum cache time of one week and preferably up to one year for static assets, or assets that change infrequently

The --profile flag is used to set the specific IAM user credential to authenticate this operation. As I am using this same macbook pro for my work and my personal projects, I have multiple AWS accounts to handle, thus the need for this flag to differentiate the different IAM users. Check out AWS CLI named profiles for more information. These are my config and credentials files for your reference.

# ~/.aws/config[default]region=us-west-2output=text

# ~/.aws/credentials[iam_user]aws_access_key_id=somethingaws_secret_access_key=something

[company_user]aws_access_key_id=something_elseaws_secret_access_key=something_else

The aws_access_key_id and aws_secret_access_key are specific to the iam_user that was created.

Once you are ready, you can remove the --dryrun flag and do a test run to ensure that your files are indeed uploaded to the correct bucket. Yes, a test run. It is not the end of the deployment step. We can go further to completely automate the whole process.

NOTE: AWS S3 does not charge data transfer in to the bucket, only out. So feel free do spam deployment. (In fact, S3 does not charge data transfer out to Cloudfront.)

Combine the Steps

As it stands now, we have to build our site first using webpack -p — config webpack.config.js to generate the files, then upload the files using theaws s3 cp command.

To make our life better, we can create a new script command to run these commands one after another, without having us to be there waiting for the first command to finish then manually execute the other.

# package.json

..."scripts": {..."deploy": "webpack -p --config webpack.config.prod.js && aws s3 cp ./dist s3://better-cover-letter --recursive --exclude "*.DS_Store" --cache-control public,max-age=604800 --dryrun --profile iam_user"...}

So just run npm run deploy and these will happen in chronological order.

Old production files are cleaned up by clean-webpack-plugin
New production files are compiled into the dist folder (based on my webpack config file)
The production files are then uploaded to S3 and ready for access once the cache in the AWS Cloudfront CDN expires.

There it is, the fully automated process for uploading the static website.

More Housekeeping (Optional)

If you are bundling your javascript files with a hash like me, you will find your S3 bucket accumulating with old js file instead of getting replaced by the new ones since they are different files by virtue of the hash in their file name, eg bundle-0af19d01880334b789.js. Not so much if you are uploading just bundle.js which will replace any bundle.js present in the bucket.

Since storing files in S3 isn’t free, albeit not that expensive either, its still wise to remove files that you will never be using again.

So we can use AWS CLI again to do a removal of these old js files before upload (note: I am leaving the files in the root directory of the bucket untouched, just cleaning up the assets folder).

aws s3 rm s3://better-cover-letter/assets --recursive --profile iam_user --dryrun

Once again, combine them in the deploy script.