I am using a Development Instance of Google Cloud Bigtable with Python client google-cloud-happybase package.
For development purposes: -My table has 56.5k rows with 18 columns.
-My table has 1 column family
-Average sized content of each row element is 9.5 bytes.
-Row keys are on average ~ 35 bytes
-The row keys are balanced.
When I use scan() function on my table I get a generator which can be used to get the contents of each row key. Whenever I read the contents from the generator, I do not have timing consistency for example:
samp = table.scan(columns = ['sample_family:ContactId'])
for i in range(56547):
start_time = timeit.default_timer()
samp.next()
elapsed = timeit.default_timer() - start_time
append_list.append(elapsed)
-The median time to call the next() is 4.05e-06 seconds
-The max time to call the next() is .404 seconds with several calls that take at least 0.1 seconds.
-The total time to call the next() on all the elements in the generator is 2.173 seconds because of the outliers and would ideally take (4.05e-06)* 56,547 ~ .229 seconds given that the distribution of times was normally distributed.
Obviously there are several outliers that throw off the performance.
My question is why am I seeing this type of performance as it doesn't align with the metrics found here: https://cloud.google.com/bigtable/docs/performance
My thoughts are that since the workload is significantly less than < 300 GB, Bigtable might not be able to balance data that optimizes performance for smaller data sets as compared to larger sets.
Also even though my Development instance is using 1 node with the 17.1MB I feel this should not be an issue.
-I was wondering if anyone could give me insights to the problems/issues encountered and what possible steps to remedy the situation.