Amazon Redshift - Replication - Data load Vs Query Performance Issues

Question

We are in the process of migrating our data warehouse from Oracle to Redshift. Currently we have two instances of Oracle database - one DW instance (Primary) gets data loaded from different sources throughout the day and another DW (Secondary) instance replicating the data from the primary DW. All reporting platforms point to the Secondary DW instance. How can we address this in Redshift? Should we need to have two instances of Redshift one replicating from the other? If we have just one Redshift instance will the data load overhead affects the query performance. Will there be table locks issue?

Appreciate your suggestions. Thanks.

GShenanigan GShenanigan · Accepted Answer · 2017-06-18T21:44:40

It really depends how quickly your reporting platforms need access to the data that is loaded throughout the day. If it can wait, then it makes sense to batch load during quiet hours. I suspect from the fact that you're using replication in your current setup, that you require the data to be loaded and available as soon as possible.

In that case, it would make sense to utilise Redshift's Workload Management (WLM) settings. This allows you to designate multiple workload groups, and allocate a concurrency level and cluster resource allocation to each. Using this model, you can ring-fence resources to ensure your query performance for your reporting tools and end-users is guaranteed a consistent allocation of resources, while still dedicating a portion of the cluster's query queue and resources to your data loads.

This would also eliminate the need for having two separate database instances to handle loading and serving data.

See here for more detail on WLM in Redshift: http://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html

Amazon Redshift - Replication - Data load Vs Query Performance Issues

2 Answers