Problem : Regular expression not working as expected for HBase scan filter. Although this RegEx passes without any error it doesn't return filtered rows only.
Background : We are storing our data in HBase as string (I know it should have been in Avro but need to work with this now.)
Our HBase column DataRows look something like below, pipe is used as delimiter.
NAME|10000081|10000102|13513|10102026|GENDER|ID NAME|10000081|10000101|13513|10102026|GENDER|ID NAME|10000081|10000103|13513|10102026|GENDER|ID NAME|10000082|10000104|13515|10102026|GENDER|ID NAME|10000082|10000104|13516|10102026|GENDER|ID
I am writing a RegEx filter for the HBase scanner which will scan these rows.
My RegEx string looks like this :
^NAME\\|.*\\|.*\\|.*\\|.*\\|.*\\|.*$
This is input for HBase QualifierFilter, e.g
Filter qfilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL,new RegexStringComparator(regexString.toString()));
In above example for regex string (I want to filter only Name=RECKO and 3rd col = 10000101). It returns all rows.
Regex String = ^NAME\\|.*\\|10000101\\|.*\\|.*\\|.*\\|.*$
What is wrong with my regular expression? Any pointers/suggestions are appreciated very much.
Test Program:
public class RegEx1 {
public static void main(String[] args) {
String Str[] = {
"PC|10000081|10000102|13513|10102026|LOC|ic",
"PC|10000081|10000101|13512|10102025|LOC|zc",
"NAME|10000042|10000084|13576|10101626|GENDER|cc",
"NAME|10000042|10000084|13576|10101626|GENDER|za",
"NAME|10000042|10000084|13576|10101626|GENDER|zc",
"NAME|10000061|10000086|13581|10101630|GENDER|ic",
"NAME|10000061|10000086|13581|10101630|GENDER|za",
"NAME|10000061|10000086|13581|10101630|GENDER|zc",
"NAME|10001076|10001744|15106|10123669|GENDER|cc",
"NAME|10001076|10001744|15106|10123669|GENDER|za",
"NAME|10001076|10001744|15106|10123669|GENDER|zc",
"NAME|10000061|10000086|13581|10101630|GENDER|ic",
"NAME|10000061|10000086|13581|10101630|GENDER|za",
"NAME|10000061|10000086|13581|10101630|GENDER|zc",
"NAME|10001075|10001743|15105|10123664|GENDER|ic",
"NAME|10001075|10001743|15105|10123664|GENDER|za",
"NAME|10001075|10001743|15105|10123664|GENDER|zc",
"NAME|10001077|10001745|15239|10123673|GENDER|cc",
"NAME|10001077|10001745|15239|10123673|GENDER|za",
"NAME|10001077|10001745|15239|10123673|GENDER|zc",
"NAME|10002165|10000102|10151364|10151363|GENDER|ic",
"NAME|10002165|10003668|10151364|10151363|GENDER|za",
"NAME|10002165|10003668|10151364|10151363|GENDER|zc",
"NAME|10002167|10003670|10151368|10151367|GENDER|cc",
"NAME|10002167|10003670|10151368|10151367|GENDER|zb" };
for (String s : Str){
System.out.println(s);
System.out.println(s.matches("^NAME\\|10002167\\|.*\\|.*\\|.*\\|*$"));
}
}
}
For above program I get all input values as matches, actually it should match only strings where first column = "NAME" and 2nd column is 10002167.
Update : Thanks to @Aviram Segal. After correcting regex it works in Java test program but not in HBase scan filter.