How do I lock down a static website hosted on Amazon S3 so that only a certain group of folks (for example customers, employees, or project owners) can access it via a web browser over the internet?
There are at least a few ways to restrict access to a static site on S3: filtering by source IP address, generating signed urls/cookies, or putting some kind of auth gateway / proxy in front of S3 that requires users to log in.
But the AWS docs on the topic are vast and unwieldy, and there's a huge gotchya to be aware of.
This post discusses the ins and outs of several approaches, then presents a quick reference in the form of a cheat sheet. (Or, skip right to the cheat sheet).
Just Say No to static website hosting
S3 has this thing called a website endpoint, aka "static website hosting," which provides a way to make your S3 bucket content "optimized for access from a web browser."
That sounds like a perfect starting point! But in this case, it isn't, because A) it requires all your content to be publicly readable, and B) it doesn't support TLS. While it's possible to write a bucket policy with a condition (e.g. IP address filtering) that makes it not really "publicy readable," lack of TLS support means anybody listening in on the network (when somebody is legitimately accessing the site) can read the content in plain text, which...would seem to defeat the whole purpose of access control.
So you probably want to use the REST API endpoint instead.
Hint: if you're trying to access content in S3 via a URL that contains the string
s3-website, that is the website endpoint.
By not using the website endpoint, we'll lose out on some conveniences, most notably redirection support:
But it's nothing that we can't work around.
How to restrict access based on:
Source IP address
To only allow traffic from the company network (or whatever) you can use a bucket policy to specify a source IP range. This is a relatively blunt instrument – an IP address range might be an overly broad way to grant access, and if your IP address changes regularly it's not going to be fun to manage – but if it suits your use case, it's a simple and solid approach.
Here is a funny way to restrict access to websites hosted on S3: define a bucket policy that requires some secret value inside the user agent string, and then use a browser extension to send a custom user agent header that contains the secret value. There are similar things you can do with the referer header. This is clever but extremely quick and dirty; depends on the sitch but generally I'd recommend against it.
S3 lets you programmatically generate signed urls that provide temporary access to otherwise-private bucket objects, and CloudFront does something similar. But I think of this as being associated with a different use case than a private static website accessed via a fixed URL. This is more like a mechanism for giving your user an expiring link to download something they just purchased, or to display private images within an app. This mechanism could be used to build something that does what we want here, but it seems like a roundabout tooling choice compared to the other options.
Basic HTTP authentication gateway
If basic HTTP authentication is good enough for your situation, you can run a proxy like s3auth on heroku and have it sit in front of requests to your bucket, requiring username/password to access the content. Or for a serverless variant, put a CloudFront distribution in front of the bucket and use Lambda@Edge to force basic HTTP auth. (If you don't need the benefits of a CDN and would rather do without the bulk and slowness of managing a CloudFront distribution, I imagine you could also do the same thing without CloudFront by using a regular Lambda function doing nearly the same thing, with an API gateway endpoint in front of it.)
External identity provider authentication gateway
To require users to log in through an identity provider like Google, Github, or Amazon, here is an awesome serverless example that puts a CloudFront distribution in front of the S3 bucket and uses Lambda@Edge and passport to do both authentication and authorization. It ensures that the only way to access the content via the CDN is A) logging in via Google and B) having an email address that matches a given domain.
This is pretty slick! I especially like this approach for the use case of an internal (employee or project owner) site if IP address filtering doesn't fit the bill.
Just like with the basic HTTP auth example, I imagine you could chop out CloudFront if desired, putting an API gateway endpoint in front of a regular Lambda function doing nearly the same work.
(There are also many alternatives that would require a long-running authenticating proxy server).
It would be really nice in some cases to be able to say "if the IAM user is logged into the AWS console, give them access to the contents of S3 bucket $FOO in the browser via this fixed URL." But there is no obvious mechanism to do that; as far as I can tell the closest possible thing would be using Login with Amazon (perhaps via passport) to do the OIDC dance and then using the logged in user's email address to do whatever authorization is needed. But this is also kind of weird because it's like...authenticating with an amazon.com account, not an IAM user.
Log in through some other app you control / existing SSO
If users are already logging into another app in your universe of apps, (or some single-sign-on kinda thing), and you want to then send 'em over to a restricted-access static site, issuing CloudFront signed cookies and then redirecting your user to the static site might be a good route, and here's a helpful post about it.
Roll your own user authentication
If none of the approaches above work and you find yourself considering rolling your own new authentication system for a static site – you want to manage sign up, sign in, password recovery, etc, and BE the identity provider that you want to see in the world – consider a managed service like Cognito or Auth0 instead. You could then use that in combination with the kind of external identity provider auth gateway discussed above.
There are a variety of mechanisms to restrict access to static sites hosted on S3. It's worth noting that here we're only talking about restricting browser access to static files in s3. If the site loads in the browser, then makes API requests that need some additional access control – if this is more of a JAMstack situation – that’s a horse of a different color and the approaches discussed here on their own don't cover that. But putting an authentication gateway in front of it is a good first step to set the stage for additional authorization beyond accessing the static files.
Another note: for any of the approaches with a CloudFront distribution in front of the S3 bucket, you'll want to restrict access using an origin access identity so that the bucket content must be accessed via CloudFront.
Hopefully this post provides a useful overview and points you in the right direction for where to go next! If there's anything I've omitted or misrepresented please let me know.