I am trying to create a map-reduce job in Java on table from a HBase database. Using the examples from here and other stuff from the internet, I managed to successfully write a simple row-counter. However, trying to write one that actually does something with the data from a column was unsuccessful, since the received bytes are always null.
A part of my Driver from the job is this:
/* Set main, map and reduce classes */
job.setJarByClass(Driver.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
/* Get data only from the last 24h */
Timestamp timestamp = new Timestamp(System.currentTimeMillis());
try {
long now = timestamp.getTime();
scan.setTimeRange(now - 24 * 60 * 60 * 1000, now);
} catch (IOException e) {
e.printStackTrace();
}
/* Initialize the initTableMapperJob */
TableMapReduceUtil.initTableMapperJob(
"dnsr",
scan,
Map.class,
Text.class,
Text.class,
job);
/* Set output parameters */
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
As you can see, the table is called dnsr
. My mapper looks like this:
@Override
public void map(ImmutableBytesWritable row, Result value, Context context)
throws InterruptedException, IOException {
byte[] columnValue = value.getValue("d".getBytes(), "fqdn".getBytes());
if (columnValue == null)
return;
byte[] firstSeen = value.getValue("d".getBytes(), "fs".getBytes());
// if (firstSeen == null)
// return;
String fqdn = new String(columnValue).toLowerCase();
String fs = (firstSeen == null) ? "empty" : new String(firstSeen);
context.write(new Text(fqdn), new Text(fs));
}
Some notes:
- the column family from the
dnsr
table is justd
. There are multiple columns, some of them being calledfqdn
andfs
(firstSeen); - even if the
fqdn
values appear correctly, the fs are always the "empty" string (I added this check after I had some errors that were saying that you can't convert null to a new string); - if I change the
fs
column name with something else, for examplels
(lastSeen), it works; - the reducer doesn't do anything, just outputs everything it receives.
I created a simple table scanner in javascript that is querying the exact same table and columns and I can clearly see the values are there. Using the command line and doing queries manually, I can clearly see the fs
values are not null, they are bytes that can e later converted into a string (representing a date).
What can be the problem I'm always getting null?
Thanks!
Update:
If I get all the columns in a specific column family, I don't receive fs
. However, a simple scanner implemented in javascript return fs
as a column from the dnsr
table.
@Override
public void map(ImmutableBytesWritable row, Result value, Context context)
throws InterruptedException, IOException {
byte[] columnValue = value.getValue(columnFamily, fqdnColumnName);
if (columnValue == null)
return;
String fqdn = new String(columnValue).toLowerCase();
/* Getting all the columns */
String[] cns = getColumnsInColumnFamily(value, "d");
StringBuilder sb = new StringBuilder();
for (String s : cns) {
sb.append(s).append(";");
}
context.write(new Text(fqdn), new Text(sb.toString()));
}
I used an answer from here to get all the column names.