I have a hbase table which contains contact information of customers. This table contains around 700k rows. I have a script which has to query customers table to find matches for 2000-3000 records. Each scan takes around 1s to complete. So for 2000 records it takes 33 minutes to complete. I wanted to see if i can improve this performance. I have tried setting caching but it didnt help. Here are the details. I have only one column family on customers table and customer id is the row key. My query looks likes this.
SingleColumnValueFilter('internal', 'country', =, 'binary:GB') AND SingleColumnValueFilter('internal', 'postcode', =, 'binary:W24RT') AND SingleColumnValueFilter('internal', 'street', =, 'binary:bayswaterroad')
How can i improve the performance?