QP: A great Big Data AWS presentation

The “Essentials: Architectural Patterns for Big Data on AWS” webinar from today was one of the best webinars that I have seen in a long time.

Siva Raghupathy (https://www.linkedin.com/in/siva-raghupathy-823229) methodically covered all the parts that he mentioned at the beginning of the presentation. The last summary slide really shows everything that he covered:

I really like the use of temperature graphs to compare and contrast AWS usage scenarios and the best services for such scenarios. This is a great example of such a graph (I’ve seen some other AWS webinars that use similar visualization).

It’s funny that at the beginning of every AWS webinar a bunch of people ask if the slides and presentation video will be available. The answer is always the same “we’ll send out a link to the slides and video in a few days”. I have attended several of these webinars and with the exception of one – the slides and video are never sent (for the exception – a link to video/slides was available for a few days – then the video disappeared).

These days I typically record the webinars if I can. I’m really glad to have recorded this one. I would love to post the video of the presentation but I cannot do so since it is AWS’s property. If the video for this fantastic webinar does not show up – please hit up @awscloud (https://twitter.com/awscloud) and ask them to give me permission to post the video (alternatively – I’m more than happy to send it to them for posting).

Cheers,

Eli4d

If you found this useful – let me know via <a href=”http://twitter.com/eli4d“>@eli4d on Twitter</a>

How to create a static content server on Amazon S3

Overview

In this tutorial I quickly go over creating a static site using S3. This should be a simple process and for the most part it is except Amazon’s security policy editor. There are many ways to control security in AWS and I beat my head against a wall for many hours trying to figure what would work. I present what worked for me but this may not be the ‘best’ way to do the security for an S3 bucket. If get more info on how to better do it I will update this post accordingly.

Assumptions:

  • You’ve created an AWS account on http://aws.amazon.com (it’s super-easy)
  • My static domain (static.eli4d.com) will use WordPress.com’s nameservers. I host this blog (besides all images and static content) on wordpress.com. The $13 is well worth my time and my content is portable due to the static server usage.

Note: Originally I had created an images.eli4d.com S3 bucket but now I am switching to static.eli4d.com. While creating the images bucket I accumulated lots of scattered notes. If there’s any references to the images bucket it is due to this initial effort.

Get an AWS account

Creating an AWS account is extremely easy and it’s faster than Amazon Prime.

Get an AWS account

How to create an S3 bucket for static.eli4d.com

Pick S3

The sheer breadth of Amazon’s web services is astounding…and it keeps growing.

Pick S3

The creation step is very simple – just click that “Create Bucket”

The only gotcha is that your bucket name should be the exact name of the domain you want to associate it with. So if I want static.eli4d.com for my static content, then I need to make a bucket name of static.eli4d.com. If that bucket name is taken (it’s universal across all of AWS) – then you’re out of luck and have to go down a more complicated route (see https://eli4d.com/2015/09/02/amazon-web-services-lesson-s3-bucket-names-are-universal-so-get-your-domain-named-s3-bucket-before-someone-else-does/ ).

The creation step is very simple - just click that "Create Bucket"

S3 Management Console

S3 Management Console

S3 Management Console

S3 Management Console

It’s ALIVE

Franken url is awake…but inaccessible

It's ALIVE

Current permissions – main account

Current permissions - main account

Time to create the index.html

Time to create the index.html

Time to create robots.txt

Time to create robots.txt

Lets get back to the bucket

Lets get back to the bucket

S3 Management Console – uploading files – 1

S3 Management Console - uploading files - 1

S3 Management Console – uploading files – 2

S3 Management Console - uploading files - 2

Upload details page

Keeping it as defaults.

Upload details page

Upload complete

Upload complete

My bucket shows the uploaded files

My bucket shows the uploaded files

Testing end point – can I see that index.html

And the answer is no. Not surprising but the answer is still NO.

It’s time to go down the rabbit hole also known as AWS permissions. This is a short trip into that hole. We’ll have a longer trip when enabling an access policy between a user and this bucket.

Testing end point - can I see that index.html

Allowing anyone to get to the S3 bucket using a browser

Where do I find my S3’s ARN?

Go to the S3 bucket and edit the bucket policy to see the bucket’s ARN. In my case the ARN is arn:aws:s3:::static.eli4d.com/*

Where do I find my S3's ARN?

Setting bucket permissions – 1

Following http://blog.learningtree.com/configuring-amazon-s3-to-serve-images/ in setting bucket properties

Setting bucket permissions - 1

Setting bucket permissions – 2

Keep in mind the following: when you click the link the AWS Policy Generator will launch in a separate browser window. You then create the policy there and then you have to copy the policy that’s created (a bunch of text) from that browser window to this browser window. This is not obvious and from a UX point of view it can be crazy-making and confusing.

Setting bucket permissions - 2

Setting bucket permissions – 3

Setting bucket permissions - 3

AWS Policy Generator

The only permission that the bucket needs to be world readable is GetObject.

AWS Policy Generator

ARN is key

You need to put correct arn:

arn:aws:s3:::static.eli4d.com/* in my case as mentioned above. Mess up the ARN and you will be slightly sad.

‘Principal’ refers to anyone who accesses the bucket (so by putting * we’re saying ‘everyone’).

ARN is key

Once you add the statement

Policy generator gives you a summary before actual generation. It’s time to click the ‘Generate Policy’ button.

Once you add the statement

Clicking the ‘Generate’ button

Side note: that version date is odd. You can’t just put today’s date as the version date.

Clicking the 'Generate' button

So you have a policy and you need to copy it

I know….you’re thinking wtf and so am I. So copy the policy. Then go back to the window where you launched the policy generator.

As a key principal here: do not modify any of this text. Seriously…don’t do it.

So you have a policy and you need to copy it

Here’s where you’re going to copy the text into

Remember that browser window from which you opened the security policy editor. Go back to that one.

Here's where you're going to copy the text into

Now paste in the policy and save it

Now paste in the policy and save it

If everything is ok policy wise you get back to the main window

There’s a really quick green checkbox and here we are (sure wish the UX was better here).

If everything is ok policy wise you get back to the main window

Time to retest the endpoint

Whohoo…now we can get to the S3 bucket.

What’s left:

  • Domain mapping of static.eli4d.com domain to this endpoint
  • Permissions to allow me to sync resources

Time to retest the endpoint

Domain mapping to the S3 bucket

My eli4d.com domain is controlled by WordPress (my registrar, however, is Hover – I LOVE Hover). These instructions apply to adding the static.eli4d.com subdomain via WordPress. I had tested some other domain configurations and this turned out to be the simplest approach (thumbs up to Hover and WordPress support). Depending on your domain configuration – you’ll have to adjust your steps accordingly when adding a subdomain.

Note: any ‘Hover’ URLs from this post are a referral link to Hover. BTW in case I didn’t mention it – I love^1000 Hover.

To the wordpress.com domain editing url

The not-so-easily found domains link on WordPress.com.

To the wordpress.com domain editing url

Lets edit the DNS

Time to add my subdomain of static.eli4d.com

Lets edit the DNS

Create a CNAME record for static.eli4d.com

The steps are to:

  1. Create the CNAME
  2. Click on the ‘Add’ button
  3. Click on the ‘Save Changes’ button

Create a CNAME record for static.eli4d.com

Check that static.eli4d.com is showing on the browser

Problem – when I type static.eli4d.com it redirects to eli4d.com – why?

The answer is DNS propagation that may take between 48 to 72 hours.

Lets pretend that 48 to 72 hours have passed

Ta-da – it works!

Hint: Use Firefox/Chrome private browsing mode to validate domain since it eliminates caching issues.

Lets pretend that 48 to 72 hours have passed

Checking in: workflow – how do I upload resources to my S3 bucket?

Now what? How do I upload my static resources to this S3 bucket? It will most likely be images but it can be anything else (so S3 accepts a maximum of 5 TB sized files). I write my blog entry on my Mac via Markdown putting the static items in the post, but then how where do I go from here to there workflow-wise?

I could just log into the AWS console and upload the resources but it feels clunky and not my type of workflow. What I want is something on the command line that syncs my resource directory to my S3 bucket. So here’s my approach:

  • find a command line utility
  • configure a user on AWS that can sync data only to this bucket (this is just basic security; I don’t want my main ‘root’-ish user to do everything from my mac); ideally I would have a user per bucket but I’ll stick to one sync user to honor some semblance of simplicity and sanity
  • configure the S3 bucket to accept connection from this user (this turned out to be a bear – AWS’s security model is breathtakingly complex)

Note: If you’re ok with just uploading resources via the AWS console then you’re done…enjoy! (please let me know via Twitter that you found these instructions useful…it encourages me to write more of this)

Finding an S3 sync command line utility

Lots of possible solutions but some outdated

Lots of possible solutions but some outdated

But there’s a promising article

at: http://serverfault.com/questions/73959/using-rsync-with-amazon-s3

An Amazon native solution would be ideal (just like using the docs straight from the horses mouth – i.e. amazon).

But there's a promising article

I want sync but…

I need to start at the beginning, so I need to backup to aws cli instructions

I want sync but...

Selecting “User Guide”

Selecting "User Guide"

Nice – the page has what I need

http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html

Nice - the page has what I need

More AWS cli documentation

More AWS cli documentation

And more AWS docs

http://docs.aws.amazon.com/AmazonS3/latest/dev/walkthrough1.html

And more AWS docs

Command line install instructions

Command line install instructions

I’m using the bundled installer since I don’t have pip but I do have Python 2.7.5

I'm using the bundled installer since I don't have pip but I do have Python 2.7.5

Installing the AWS Command Line Interface – AWS Command Line Interface

Installing the AWS Command Line Interface - AWS Command Line Interface

Sweetest command line – here we go

Just follow the instructions

Sweetest command line - here we go

The ‘aws’ command works!

Note that I moved back to my standard account rather than the admin account on the mac (trying to be secure and all that jazz)

The 'aws' command works!

The command to sync a local folder to the AWS bucket

At this point this command doesn’t work yet but it will later. All possible options for aws cli can be found here: http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

 aws s3 sync /Volumes/elev-per/Dropbox/eli4d-content/s3/static.eli4d.com/2015/ s3://static.eli4d.com/2015 --delete --exclude "*.DS_Store"

Basically the above command says sync all resources from my local directory and use the local directory as the authoritative source deleting any mismatches on the S3 bucket side (i.e. the –delete) and exclude the Mac side pollution of .DS_Store – so don’t sync those.

The fantastically awesome Nicolas Zakas and a slight sad story about S3

I happened to come across a very interesting post by Nicolas Zakas ( http://www.nczonline.net/blog/2015/08/wordpress-jekyll-my-new-blog-setup/ ).

There are 2 very interesting things:

  1. His comment about s3command was very interesting. Since I don’t regenerate all of the static content – awscli is fine for me. But it’s something to keep in mind for static blog generation.
  2. The ability of someone else to indirectly squat on his domain by taking the name as an S3 bucket. I’ve written about this here: https://eli4d.com/2015/09/02/amazon-web-services-lesson-s3-bucket-names-are-universal-so-get-your-domain-named-s3-bucket-before-someone-else-does/

The fantastically awesome Nicolas Zakas and a slight sad story about S3

Creating an S3 user for syncing

As mentioned before I need a user that can sync resources for this specific bucket

I need some Sam IAM (come on Dr. Seuss – work with me here)

http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html#cli-signup%20%28IAM%29

As mentioned before I need a user that can sync resources for this specific bucket

Creating a sync user via IAM – 1

Time to go to that iam console

Creating a sync user via IAM - 1

Creating a sync user via IAM – 2

time to click that user’s link

Creating a sync user via IAM - 2

Creating a sync user via IAM – 3

Select ‘Create New Users’

Creating a sync user via IAM - 3

Creating a sync user via IAM – 4

Creating a sync user via IAM - 4

Creating a sync user via IAM – 5

Creating a sync user via IAM - 5

Creating a sync user via IAM – 6

Here is where you create an access key (I already created it). The gist is AWS creates a public/private key and you need to save it because it’s never shown to you again (i.e. the private key).

Creating a sync user via IAM - 6

Now how do I give this user access to my images bucket?

Duckducking around: https://duckduckgo.com/?q=how+add+IAM+user+to+s3

I found: http://docs.aws.amazon.com/AmazonS3/latest/dev/walkthrough1.html

Click the user to see its permissions

Click the user to see its permissions

New IAM user information

New IAM user information

Configuring aws-cli with my newly created AWS user

Time to configure

http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html

Note 1: that I found my region by logging into aws console > s3 and looked at the top area for region corresponding to my s3 bucket.

Note 2: All configuration (default) is in ~/.aws/

Time to configure

Calling s3

S3 references:

http://docs.aws.amazon.com/cli/latest/userguide/cli-s3.html

http://docs.aws.amazon.com/cli/latest/reference/s3/index.html

http://docs.aws.amazon.com/cli/latest/reference/s3/ls.html

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Dang – I need IAM policy for my user.

Calling s3

Configuring my S3 bucket to allow sync from my eli4dsync user

This is what I want

This is what I want

Insert many head scratching hours and attempts to get this going and lots^1000 of expletives

I initially attempted to change the security policy of the S3 bucket to allow for my sync user. I got lots and lots ‘Access Denied’ messages. I scoured AWS documentation, Duckduckgo, Google, Stackoverflow, and a Lynda course about AWS. Somewhere along all of this I figured that maybe I need to approach this from the other side – the eli4dsync user and that maybe I should attach a policy to the user in terms of the bucket access. This is the approach that worked for me but it may not be the right approach. If someone at Amazon would clarify the way security policy works – I would love to write that up (so open invitation AWS people with security policy information to get in touch).

Image credit: https://flic.kr/p/bMGA1T

Insert many head scratching hours and attempts to get this going and lots^1000 of expletives

Applying an inline policy to the IAM user rather than the S3 bucket

Per http://blogs.aws.amazon.com/security/post/Tx3VRSWZ6B3SHAV/Writing-IAM-Policies-How-to-grant-access-to-an-Amazon-S3-bucket

So initially – it looks like this article talks about s3 policy but it isn’t about the s3 bucket but rather the IAM user.

Applying an inline policy to the IAM user rather than the S3 bucket

Testing my sync code against my changes I find that this one works

So there are two parts:

Part (1) applies to the whole bucket. ListObjects is needed for recursion that occurs through the awscli sync command (think subdirectories of files and syncing them…though S3 doesn’t have a file hierarchy concept).

Part (2) applies to objects that are within buckets.

With this inline policy my sync user does NOT have carte blanche – it’s the right thing (for my purposes).

Testing my sync code against my changes I find that this one works

It works!!!

My sync script works and I have a very specific policy for my sync user.

It works!!!

Conclusion and Thanks

That’s it.

As you can tell – the AWS security policy creation is the biggest head scratcher. The rest if fairly straightforward.

My thanks to the folks that created the following resources and/or answered my questions:

Please let me know via Twitter (https://twitter.com/eli4d) that you found these instructions useful…it encourages me to write more of this.

Amazon Web Services Lesson – S3 Bucket Names are Universal so get your domain named S3 bucket before someone else does

I recently subscribed to Nicholas Zakas’s excellent http://www.nczonline.net newsletter and came across a shocking realization about Amazon’s S3 service: all S3 bucket names are universal. Let me explain what this means.

It all started with wanting a static image server for my blog

A few weeks ago I wanted to host all images for this site on images.eli4d.com. Why? Well I wanted to be able to easily move my blog without worrying about static assets. I also wanted to explore an AWS service such as S3.

I finally got it to work after beating my head against some security policy issues (this had more to do with me than Amazon but this is for another post). One of the key points that I learned when doing this is that the simplest approach to create an S3 based static site requires naming the S3 bucket with the name of the domain.

But then I read the following from Nicolas Zakas’s newsletter

From http://www.nczonline.net/blog/2015/08/wordpress-jekyll-my-new-blog-setup/

But then I read the following from Nicolas Zakas's newsletter

OMG – what?

image attribution: https://flic.kr/p/8Y1Mp9

OMG - what?

So what does this mean?

It means that if you have any intention of ever having a static S3 based website, then you should create the S3 buckets with the various permutation of your domain’s names before someone else does (so domain.com, www.domain.com, blog.domain.com, etc…). This is worth doing even if you don’t use those S3 buckets.

Keep in mind that you’re not locked out of using any other S3 buckets for your domains. But you have to deal with some unnecessary hoops.

So what does this mean?

Thanks!

Many thanks to Nicolas Zakas for documenting his experience with S3.