2
votes

Problem:

I try to build a backup solution for Azure Cosmos DB that gives us DB dumps on a regular basis in case we programmatically corrupt the data in our database. The issue is that Data Factory does not (yet) exist for Azure Germany and we cannot rely on the automatic backups from Azure (that are only available for 8 hours). I do not want to use any extra applications outside the cloud.

What I found so far:

https://www.npmjs.com/package/mongo-dump-stream

Mongo Dump Stream should be able to connect to our DB and read from it.

My idea is to use this npm from within Azure Functions and send the result of the dump to a blob storage.

My question:

How can I send the result to a blob storage?

Can you give an example for concrete implementation?

1

1 Answers

1
votes

Here is the idea:

Do not create backup for the whole collection but for changes (delta), and save changes that happened over time. Later you can implement restore mechanism that will walk through the delta files.

Here is how implementation looks as a concept but only for the backup mechnism that you requested:

enter image description here

Here is the dedicate repo for that. I also added Azure CLI script to help you quickly reproduce my idea in your Azure tenant.

General description:

  1. Dependencies: azure-storage, unix-timestamp, documentdb
  2. I have a time triggered function. The frequency creates blobs named by datetime stamp.
  3. I store last import time in storage table.
  4. To get delta I use _ts field of any document in CosmosDb.

Benefits of this approach:

  1. delta-s are lighter and will be faster to import/backup
  2. you can set frequency of your delta generation
  3. you can see database in different states when you restore

Drawbacks:

  1. you do not have one single file to restore but many