1
votes

We are migrating our database from SQL Server 2012 to Amazon Redshift.

The front end of our application is developed in MicroStrategy (MSTR) which fires the queries on Redshift.

Although the application is working fine in Production (on SQL Server 2012), we have run into a strange issue in our PoC Environment on Redshift.

When we kicked off a dashboard in MSTR, the query from the dashboard hits Redshift and it completes successfully without any issues.

But when we stress test the application by running all the dashboards simultaneously, then that particular dashboard's query terminates in Redshift. The database does not throw any error message which is why we cannot troubleshoot why the query is terminating.

Can anyone please suggest how we should go about solving this problem.

Thank you

1

1 Answers

2
votes

The problem might be that you have some timeout on the queue that you are sending the query using WLM configuration.

Redshift is designed differently from other DB, to be optimized for Analytical queries. For that reason it doesn't cache queries results, as you would do with OLTP DB. The other difference is that you have a predefined concurrently level (also part of WLM - http://docs.aws.amazon.com/redshift/latest/mgmt/workload-mgmt-config.html). Each concurrency slot will have its allocated resources to complete big queries quickly, but it is limiting the number of concurrent queries that can run. The default configuration is 5, and you can increase it up to 50. The recommendation is to have it increased to not more than 15-20, as with 50, it means that each query is getting only 2% of the cluster resource instead of 20% (with 5) or 5% (with 20).

The combination of these two differences is: if you are connecting many dashboards, each one sends its queries to Redshift, competes over the resources (without caching each query will run again and again), and might timeout or just be too slow for an interactive dashboard.

Please make sure that you are using the Redshift optimized drivers for MicroStrategy, which are sending queries to Redshift under the above assumptions.

You can also consider putting some RDS between your dashboards and Redshift, with the aggregation data that you need for your dashboards, and that can use in-memory caching and higher concurrency on that summary data. You can see an interesting pattern that you can implement with pg-bouncer see here, that can help you send some queries (the analytical ones) to Redshift, and some (the aggregated dashboard ones) to a PostgreSQL one.