Had trouble with best fit for this question
With Azure Table Storage you just have fixed key of partition, rownum. They charge based on size and number of operations.
Have a LAN document manage application with a WPF client and a SQL database
Taking it to Azure and I want to minimize cost
Most applications would be like 100,000 documents
But on the high end could be 4 million documents
Most SQL search are cheap (0.1 seconds) but some can be expensive (e.g. 60 seconds).
On the LAN version what I do is return the detail for 100 but but the raw identifiers for 10,000. So as the client needs detail for the next page it just sends back the identifiers (int) for the next 100 and a very cheap query for the detail. I don't keep the detail for the prior page and it would use too much client memory. When it gets to 10,000 then I run a search for the next 10,000.
In an Azure environment is there any reason to change that? I am paying for bandwidth but int is pretty darn small. I am paying for SQL CPU and IO also. If anything I am thinking about sending 100,000 identifier to the client.
I looked at Table Storage but if I have 100 clients doing 100 searches a day (and some do) and 100,000,000 returned the transaction cost adds up (over $700) and Table Storage inserts are relatively slow. The database is only $15.
I could store the results in a SQL table
userID (smallint), rowNum (int), docIdentifier (int)
clustered index with fill factor 50 on userID, rowNum
My problem there is a 4,000,000 documents with 1,000 login could put me into 16 GB on that table alone
I am thinking send 100,000 int at a time to the client.
Is there a better design?