I am just trying to evaluate HBase for some of data analysis stuff we are doing.
HBase would contain our event data. Key would be eventId + time. We want to run analysis on few events types (4-5) between a date range. Total number of event type is around 1000.
The problem with running mapreduce job on the hbase table is that initTableMapperJob (see below) takes only 1 scan object. For performance reason we want to scan the data for only 4-5 event types in a give date range and not the 1000 event types. If we use the method below then I guess we don't have that choice because it takes only 1 scan object.
public static void initTableMapperJob(String table, Scan scan, Class mapper, Class outputKeyClass, Class outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException
Is it possible to run mapreduce on a list of scan objects? any workaround?
Thanks