12
votes

Amazon says the following on Redshift billing "Node usage hours are billed for each hour your data warehouse cluster is running in an Available state. If you no longer wish to be charged for your data warehouse cluster, you must terminate it to avoid being billed for additional node hours." This means if I just create a cluster and whether use it or not I'll be billed 24/7 because the cluster doesn't have any state like "Suspend". Is there a way to shut down the whole Redshift server when not in use so that I'll be billed only for the hours when I want to use the clusters?

Edit: With Tomasz's reply it sounds like if I want to shutdown the cluster on weekend it'll be like backing up the whole database on Friday evening and restoring on Sunday evening. This doesn't sound good. What does Amazon really mean when they say "PAY ONLY FOR THE HOURS YOU USE"?

Can you tell me how much time will it take to backup/restore a data warehouse of size around 100GB? Can I automatically associate security groups to the cluster after restoring from the Java code?

3

3 Answers

14
votes

You can create a manual snapshot of a cluster when you have finished work and then remove cluster.

You will pay for S3 storage, but that is much less than for running Redshift cluster.

Next day just restore cluster from latest snapshot. You will have to add security groups to new cluster, probably with JAVA API:

The new cluster will be associated only with the default security and parameter groups. If the original cluster was associated with any other security or parameter group, you will need to manually associate those groups with the new cluster.

The easiest way to create snapshot is from the console, but you probably will want to do it automatically using cli or Java SDK.

Creating a snapshot of a 3 node cluster filled up to 80% took me about 5 minutes (it's so quick because snapshots are incremental). 100GB is much less than my setup, so it should be even faster. Also restore shouldn't take long time.

5
votes

UPDATE: A lot has changed in the intervening years, in particular restore from snapshot is now quite fast. Your cluster becomes available in a few minutes and you can run queries while the restore continues in the background. Total time for complete restore of 100GB would now be measured in minutes (varies based on node type & count).


What does Amazon really mean when they say "PAY ONLY FOR THE HOURS YOU USE"?

You pay for the whole hour of any partial hours used.

Can you tell me how much time will it take to backup/restore a data warehouse of size around 100GB?

Snapshots are incremental and this is what makes them fast (as Tomasz mentioned). It's is fairly quick to shutdown a cluster about half an hour. However restoring from a snapshot is very slow I'd suggest around 3 hours for restoring 100GB.

If you really want to be able to take a database cluster up and down quickly you might be better using another analytic DB (e.g. Greenplum or Vertica free editions) with the data stored on EBS volumes. It'd be a lot more work to manage though, that's the tradeoff.

3
votes

Now we can able to pause and resume the Redshift cluster (both Console and CLI)

check out the link:

https://aws.amazon.com/blogs/big-data/lower-your-costs-with-the-new-pause-and-resume-actions-on-amazon-redshift/