3
votes

We have a fairly large redis database (~40GB) at our colocation facility that we wish to migrate to AWS' ElastiCache Redis service. The challenge is the data is frequently updated in production (tens of thousands of write operations / minute) so uploading an RDB file to ElastiCache would result in an ElastiCache instance that is already out of date.

Amazon's documentation recommends importing an RDB file of the existing database - which is fine. But how do we also import the hundreds of thousands of write operations that took place between the time we took the RDB snapshot, uploaded it to S3, and imported it into the ElastiCache instance? ElastiCache doesn't seem to support SLAVEOF so we can't simply make it a slave initially and then switch it to the master.

What options exist to keep an ElastiCache redis instance in approximate sync with an external redis server until we're ready to flip the switch and make the ElastiCache server the primary redis server?

1
As a follow up - we ended up using a regular EC2 instance which we configured redis on. We simply made the EC2 instance replicate from the original source, and then promoted it to master. ElastiCache didn't seem to be able to support this simple task. - Dan

1 Answers

1
votes

Personally, I think the easiest solution is stopping the service, moving data to ElastiCache, and restarting the service.

If you cannot stop the service, you can incrementally moving data to ElastiCache. However, this is a much much more complicated solution, and you need implement a PROXY to dispatch requests to your old Redis instance and the new ElasticCache, and a DATA-MOVER to incrementally move data. The DATA-MOVER works as follows:

  1. At the beginning, the proxy dispatches all requests to Redis.
  2. Use the scan command to fetch some keys from Redis.
  3. Get values for these keys. If you have complicated data structures, e.g. list, set, you might need type, zsan, sscan, hscan commands to get the values.
  4. Write these keys and values to ElasticCache.
  5. The proxy dispatches requests for these keys to ElasticCache, and dispatches requests for other keys to Redis
  6. Go to step 2, until all keys have been moved to ElasticCache.

Each time we only move a small part of data to ElasticCache (and that's why it's called incrementally moving). When these data has been moved to ElasticCache, new updates will be written to ElasticCache. So you won't lose too many updates.

If your keys have some special pattern, that's might be helpful for incrementally moving and request dispatching. Say, first, you can move keys with prefix aaa:, then, you can move keys with prefix bbb:, and so on.

If your keys doesn't have any special pattern, you can use a hash function to calculate a hash key for each key, and move data based on the hash key. For example: first, move all keys that meet the following condition: hash( key1 ) mod 10 == 0, then move all keys that meet the following condition: hash( key2 ) mod 10 == 1, and so on.

Just as I mentioned, this is a very very complicated solution. You might still like the easiest one :)