7
votes

What is a good tactic for querying Cosmos DB (Table Storage API) to get the "next" item in Storage? "Next" is defined as the top 1 item that was not returned by the last call. The item that was returned by the last call is being held in memory. Using .NET Framework, C#. Table is expected to hold around 2 million entries, so table scans are not preferred. :)

Table Storage looks like this: Partition Key (composite of multiple values): "0000/00/01/2020-01-11". Row key single int value 1 for example. Row contains other string data.

So Key-Value pairs (Partition Key and Row Key) look like the below. The application is read-heavy, but not necessarily using the Log Tail Pattern (https://docs.microsoft.com/en-us/azure/cosmos-db/table-storage-design-guide#log-tail-pattern).

0000/00/01/2020-01-11,1,aaa,x
0000/00/01/2020-01-11,2,aaa,y
0000/00/01/2020-01-11,3,aaa,z
0000/00/01/2020-01-11,4,bbb,x
0001/00/01/2020-01-11,5,aaa,x
0001/00/01/2020-01-11,6,ddd,x

(Note the bottom two entities will be in a different partition, so "aaa,x" exists in two partitions).

So I think querying to get just one item is

TableQuery<MyClass> query = new TableQuery<MyClass>()
                    .Where(TableQuery.GenerateFilterCondition("0000/00/01/2020-01-11", QueryComparisons.Equal, "aaa")).Take(1);

If that is code is correct, and returns "aaa,x" how to ensure that the subsequent query will get "aaa,y" and the next will get "aaa,z", and the next will get "bbb,x" and the next will get "aaa,x" again in the same partition?

If it does not make sense to make rich objects, and instead directly query the REST API and maybe keep the item that was last used in another Table and an Outer Join or other filter condition between the two tables, I'm open to designing in that direction instead.

Thanks!

1
Discovered wintellect.com/wp-content/uploads/2017/05/…, helping but I have not found a solution yet.Snowy
It sounds like you want a queue, not a tableHong Ooi
@HongOoi sorry I don't want a queue, as rows do get updated intermittently, and queue semantics would mean "aaa,x" would be discarded after it is used when I want that row within the partition after it is used. Thanks.Snowy
So 1,2,3,4,5,6 are your example row keys, but you need to find all values with a given aaa value, which might be the rows 1,4,5. This means that you basically know almost nothing about the row key. You only know "it must be higher than the last one I found". You will have a bit of scanning there, I don't see how you want to achieve O(1) like you describe in the bounty. Am I understanding this correctly?Alex AIT
Second point: I don't quite understand your sample FilterCondition. It has the syntax GenerateFilterCondition (string propertyName, string operation, string givenValue)Alex AIT

1 Answers

5
votes

Two possible ideas depending on your other requirements:

  • Manually iterate through the rows by keeping track of your last rowkey
  • Define a query to get all rows that match your criteria, and leverage the TableContinuationToken

You might need to adapt the filters in my examples to your exact requirements, but the idea should fit either way.


Manually iterate through the rows by keeping track of your last rowkey

string currentPartitionKey = "0000/00/01/2020-01-11";
string currentRowKey = "-1";

var query = new TableQuery()
    .Where(TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, currentPartitionKey),
        TableOperators.And,
        // For the very first query, you may be able to omit this condition
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThan, currentRowKey)))
    .Take(1);

// Execute, do stuff
// Switch to next partitionKey if no results were found


Define a query to get all rows that match your criteria, and leverage the TableContinuationToken

Code could look something like this:

string currentPartitionKey = "initialParitionkey";
do
{
    TableContinuationToken continuationToken = null;
    do
    {

        TableQuery query = new TableQuery()
            .Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, currentPartitionKey));

        var segment = await table.ExecuteQuerySegmentedAsync(query, continuationToken,
            new TableRequestOptions { TableQueryMaxItemCount = 1 }, cancellationToken);
        continuationToken = segment.ContinuationToken;

        // process results
    }
    while (continuationToken != null)

    currentPartitionKey = getNextPartitionKey(currentPartitionKey);
}
while (currentPartitionKey != null)