In my HBase table, each row may be have different columns than other rows. For example;
ROW                       COLUMN
1-1040                    cf:s1
1-1040                    cf:s2
1-1043                    cf:s2
2-1040                    cf:s5
2-1045                    cf:s99
3-1040                    cf:s75
3-1042                    cf:s135
As seen above, each row has different columns than other rows. So, when I run scan query like this;
scan 'tb', {COLUMNS=>'cf:s2', STARTROW=>'1-1040', ENDROW=>'1-1044'}
I want to get cf:s2 values using above query. But, does any performance issue occur due to each row has different columns?
Another option;
ROW                       COLUMN
1-1040-s1                 cf:value
1-1040-s2                 cf:value
1-1043-s2                 cf:value
2-1040-s5                 cf:value
2-1045-s99                cf:value
3-1040-s75                cf:value
3-1042-s135               cf:value
In this option, when I want to get s2 values between 1-1040 and 1-1044, I am running this query for this;
scan 'tb', {STARTROW=>'1-1040s2', ENDROW=>'1-1044', FILTER=>"RowFilter(=, 'substring:s2')"}
When I want to get s2 values, which option is better in read performance?