1
votes

Sorry for my poor English...

Here is my problem. I have to import data from MySQL to HBase. In order to improve performance a big table is divided into some small tables in MySQL. For example, a 'message' table is divided into 'message_0', 'message_1', ... 'message_100'.

I need to import those message tables into a single table (such as 'message') in HBase. The PK in each table is message_id and it is auto-increment. As we know, HBase has a unique row key. I want to add a prefix to the message_id column.

I can't add a hbase-row-key column in those tables so that I want to modify the column content in sqoop query argument. I used the import command looks like below:

sqoop import --connect jdbc:mysql://host:port/message 
--username message --password message \
--query "select 't0_'+ message_id as message_key, some_split_id, some_other_columns\
from message_0 where message_id >30000 and message_id <= 31000 and \$CONDITIONS" 
--split-by some_split_id --hbase-row-key message_key 
--hbase-table message --column-family cf1

I can successfully map the column alias message_key as the HBase row key. But row key is 30001.0 which is expected as 't0_30001'.

It seems that the query didn't return the expect value of 't0_' + message_id but directly the message_id.

I also tried to modify the sqoop-generated java source to add 't0_' to message_id, then specify the compiled class to perform input action, but it dose not work neither.

Looks like this

java source:

public void set_message_key(String message_key) {
    this.message_key = "t0_" + message_key;
}
public MessageImporter with_message_key(String message_key) {
    this.message_key = "t0_" + message_key;
    return this; 
}
...
public void readFields(DataInput __dataIn) throws IOException {
    if (__dataIn.readBoolean()) { 
        this.message_key = null;
    } else {
        this.message_key = "t0_" + Text.readString(__dataIn);
    }
    ...
...
}
...
private void __loadFromFields(List<String> fields) {
    Iterator<String> __it = fields.listIterator();
    String __cur_str;
    __cur_str = __it.next();
    if (__cur_str.equals("null") || __cur_str.length() == 0) { this.message_key = null; } else {
    this.message_key = "t0_" + __cur_str;
    }
    ...
}
...
public void setField(String __fieldName, Object __fieldVal) {
    if ("message_key".equals(__fieldName)) {
        this.message_key = "t0_" + (String) __fieldVal;
    }
    ...
}

command line(add arguments below):

--jar-file someJar.jar --class-name com.foo.SomeImportClass \

Is it possible to modify the column content in the --query argument? Do I have to write mapred jobs myself to import data from multiple tables into one single table in HBase?

Thank you for your patience to read this long question.

1
Hi, have you thought about creating a view with MySQL and then import the data from that view instead of each table?Jean-Philippe Bond
@JpBond That sounds a good idea. Although I'm afraid my boss would not buy this, I will talk to him. Thank you very much. But there are still some cases I have to write mapred jobs myself.scarcer

1 Answers

2
votes

I solved a similar problem in the following way:

First off, concat() can help to combine two columns into one, the syntax is as followed

     select concat('t0_',message_id) as message_key, ...

Then, add the following option to sqoop import to specify the 'message_key' should be considered as String.

     --map-column-java message_key=String

Hope it can help. Thanks.