0
votes

I am designing a customer profile application where we need to register upto 100K customers per second through Frontend/API call. At the same time, there will be a PySpark/Databricks batch job which will migrate milliones of existing customers into the same database from different source systems. We are using Cosmos DB SQL API to support this highly concurrent application. When a new customer will register from frontend/API or get migrated through backend batch process, program will assign the unique Customer_Number to him/her and update the flag IsAssigned=True in the cosmos db container. Assignment will happen sequentially based on Sequence_Id in ascending order.

So we will identify and read the record by this clause -- > where Sequence_Id=Select MIN(Sequence_Id) from container where IsAssigned=False to assign the customer_number to the customer. Or Should I maintain the Sequence_Id used in the previous transaction somewhere else?

Please guide me how can I read Lock/Read/Update document/row of cosmos db to manage optimistic concurrency where number of request per seconds is so high.

Please advice me if you have better suggestion.

I can anticipate below query will be very slow and is a threat to the performance. Sequence_Id=Select MIN(Sequence_Id) from container where IsAssigned=False.

Sequence_Id     Customer_Number       IsAssigned    
    1           6000001234567980          False         
    2           6000001234567241          False         
    3           6000001234567673          False         
    4           6000001234567984          False                     
    5           6000000000000001          False                     
    …                 …                   False                     
9999999999      6000001234567115          False                     

I am really stuck and need your help. Thanks in advance.

1
Using a sequential number is going to slow you down considerably, as it will also require you to update this number with every insert. You'll also have to serialize all of your inserts. Is this really a requirement? Or can you just generate a GUID for your new customer numbers? This guarantees uniqueness, and there is no need to serialize any writes.David Makogon
We need a 9 digit random number which will be prefix by 600000 and assign to customer. I am getting 0.006% duplicate number when we generate random number for 12 million customers. GUID will not help as it will generate alpha-numeric value.user2941026

1 Answers

0
votes

You can use generated random number as partition key and unique key. If insert fails, you know that you should try inserting with another number.

More info: https://docs.microsoft.com/en-us/azure/cosmos-db/unique-keys