I am designing a customer profile application where we need to register upto 100K customers per second through Frontend/API call. At the same time, there will be a PySpark/Databricks batch job which will migrate milliones of existing customers into the same database from different source systems. We are using Cosmos DB SQL API to support this highly concurrent application. When a new customer will register from frontend/API or get migrated through backend batch process, program will assign the unique Customer_Number to him/her and update the flag IsAssigned=True in the cosmos db container. Assignment will happen sequentially based on Sequence_Id in ascending order.
So we will identify and read the record by this clause -- > where Sequence_Id=Select MIN(Sequence_Id) from container where IsAssigned=False to assign the customer_number to the customer. Or Should I maintain the Sequence_Id used in the previous transaction somewhere else?
Please guide me how can I read Lock/Read/Update document/row of cosmos db to manage optimistic concurrency where number of request per seconds is so high.
Please advice me if you have better suggestion.
I can anticipate below query will be very slow and is a threat to the performance. Sequence_Id=Select MIN(Sequence_Id) from container where IsAssigned=False.
Sequence_Id Customer_Number IsAssigned 1 6000001234567980 False 2 6000001234567241 False 3 6000001234567673 False 4 6000001234567984 False 5 6000000000000001 False … … False 9999999999 6000001234567115 False
I am really stuck and need your help. Thanks in advance.