I'm trying to iterate through all items in my DynamoDB table. (I understand this is an inefficient process but am doing this one-time to build an index table.)
I understand that DynamoDB's scan() function returns the lesser of 1MB or a supplied limit. To compensate for this, I wrote a function that looks for the "LastEvaluatedKey" result and re-queries starting from the LastEvaluatedKey to get all the results.
Unfortunately, it seems like every time my function loops, every single key in the entire database is scanned, quickly eating up my allocated read units. It's extremely slow.
Here is my code:
def search(table, scan_filter=None, range_key=None,
attributes_to_get=None,
limit=None):
""" Scan a database for values and return
a dict.
"""
start_key = None
num_results = 0
total_results = []
loop_iterations = 0
request_limit = limit
while num_results < limit:
results = self.conn.layer1.scan(table_name=table,
attributes_to_get=attributes_to_get,
exclusive_start_key=start_key,
limit=request_limit)
num_results = num_results + len(results['Items'])
start_key = results['LastEvaluatedKey']
total_results = total_results + results['Items']
loop_iterations = loop_iterations + 1
request_limit = request_limit - results['Count']
print "Count: " + str(results['Count'])
print "Scanned Count: " + str(results['ScannedCount'])
print "Last Evaluated Key: " + str(results['LastEvaluatedKey']['HashKeyElement']['S'])
print "Capacity: " + str(results['ConsumedCapacityUnits'])
print "Loop Iterations: " + str(loop_iterations)
return total_results
Calling the function:
db = DB()
results = db.search(table='media',limit=500,attributes_to_get=['id'])
And my output:
Count: 96
Scanned Count: 96
Last Evaluated Key: kBR23QJNAwYZZxF4E3N1crQuaTwjIeFfjIv8NyimI9o
Capacity: 517.5
Loop Iterations: 1
Count: 109
Scanned Count: 109
Last Evaluated Key: ATcJFKfY62NIjTYY24Z95Bd7xgeA1PLXAw3gH0KvUjY
Capacity: 516.5
Loop Iterations: 2
Count: 104
Scanned Count: 104
Last Evaluated Key: Lm3nHyW1KMXtMXNtOSpAi654DSpdwV7dnzezAxApAJg
Capacity: 516.0
Loop Iterations: 3
Count: 104
Scanned Count: 104
Last Evaluated Key: iirRBTPv9xDcqUVOAbntrmYB0PDRmn5MCDxdA6Nlpds
Capacity: 513.0
Loop Iterations: 4
Count: 100
Scanned Count: 100
Last Evaluated Key: nBUc1LHlPPELGifGuTSqPNfBxF9umymKjCCp7A7XWXY
Capacity: 516.5
Loop Iterations: 5
Is this expected behavior? Or, what am I doing wrong?