Sorry for my poor English...
Here is my problem. I have to import data from MySQL to HBase. In order to improve performance a big table is divided into some small tables in MySQL. For example, a 'message' table is divided into 'message_0', 'message_1', ... 'message_100'.
I need to import those message tables into a single table (such as 'message') in HBase. The PK in each table is message_id and it is auto-increment. As we know, HBase has a unique row key. I want to add a prefix to the message_id column.
I can't add a hbase-row-key column in those tables so that I want to modify the column content in sqoop query argument. I used the import command looks like below:
sqoop import --connect jdbc:mysql://host:port/message
--username message --password message \
--query "select 't0_'+ message_id as message_key, some_split_id, some_other_columns\
from message_0 where message_id >30000 and message_id <= 31000 and \$CONDITIONS"
--split-by some_split_id --hbase-row-key message_key
--hbase-table message --column-family cf1
I can successfully map the column alias message_key
as the HBase row key. But row key is 30001.0 which is expected as 't0_30001'.
It seems that the query didn't return the expect value of 't0_' + message_id but directly the message_id.
I also tried to modify the sqoop-generated java source to add 't0_' to message_id, then specify the compiled class to perform input action, but it dose not work neither.
Looks like this
java source:
public void set_message_key(String message_key) {
this.message_key = "t0_" + message_key;
}
public MessageImporter with_message_key(String message_key) {
this.message_key = "t0_" + message_key;
return this;
}
...
public void readFields(DataInput __dataIn) throws IOException {
if (__dataIn.readBoolean()) {
this.message_key = null;
} else {
this.message_key = "t0_" + Text.readString(__dataIn);
}
...
...
}
...
private void __loadFromFields(List<String> fields) {
Iterator<String> __it = fields.listIterator();
String __cur_str;
__cur_str = __it.next();
if (__cur_str.equals("null") || __cur_str.length() == 0) { this.message_key = null; } else {
this.message_key = "t0_" + __cur_str;
}
...
}
...
public void setField(String __fieldName, Object __fieldVal) {
if ("message_key".equals(__fieldName)) {
this.message_key = "t0_" + (String) __fieldVal;
}
...
}
command line(add arguments below):
--jar-file someJar.jar --class-name com.foo.SomeImportClass \
Is it possible to modify the column content in the --query argument? Do I have to write mapred jobs myself to import data from multiple tables into one single table in HBase?
Thank you for your patience to read this long question.