0
votes

I have Cassandra, Sqoop and MySql installed in a Linux machine.

1) Created a small table in MySql, with the below syntax.

CREATE TABLE TEST_ONE(
test_id INT NOT NULL AUTO_INCREMENT,
test_title VARCHAR(100) NOT NULL,
PRIMARY KEY ( test_id )
);

2) Tried running sudo dse sqoop import from MySql to Cassandra

sudo dse sqoop import --connect jdbc:mysql://127.0.0.1/test --username {username} --password {password} --verbose --table TEST_ONE --cassandra-keyspace TEST_ONE_KS --cassandra-column-family TEST_ONE_CF --cassandra-row-key id --cassandra-thrift-host 127.0.0.1 --cassandra-create-schema

After running for few minutes, it hangs. If we check in the back end under the path /var/lib/cassandra/data/ the Keyspace and Column Family exists. But the data is not getting updated.

Not able to figure out what is wrong.

Please help.

Thank you Achyuth

1

1 Answers

2
votes

So I've recreated your mysql table

mysql> select * from TEST_ONE;
+---------+------------+
| test_id | test_title |
+---------+------------+
|       1 | hi         |
|       2 | there      |
|       3 | foo        |
|       4 | bar        |
|       5 | spam       |
|       6 | DSE QA     |
+---------+------------+
6 rows in set (0.00 sec)

And used your sqoop command (but i turned on debugging my log4j-server file in the cassandra resources directory)

automaton@ip-10-235-42-75:~$ sudo dse/bin/dse sqoop import --connect jdbc:mysql://127.0.0.1/test --username qa --password qapass --verbose --table TEST_ONE --cassandra-keyspace TEST_ONE_KS --cassandra-column-family TEST_ONE_CF --cassandra-row-key id --cassandra-thrift-host 127.0.0.1 --cassandra-create-schema

But I immediately got

14/01/10 17:09:28 INFO mapreduce.ImportJobBase: Beginning import of TEST_ONE
14/01/10 17:09:29 DEBUG mapreduce.DataDrivenImportJob: Using table class: TEST_ONE
14/01/10 17:09:29 DEBUG mapreduce.DataDrivenImportJob: Using InputFormat: class com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat
14/01/10 17:09:29 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
14/01/10 17:09:29 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: The row-key column specified by --cassandra-row-key is not found in the input.
    at org.apache.sqoop.mapreduce.CassandraImportJob.configureOutputFormat(CassandraImportJob.java:133)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:195)
    at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:475)
    at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:108)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
    at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

Your cassandra-row-key id is to blame and should be test_id, change it and you're imprt should work. You'll end up with this below.

cqlsh> select * from "TEST_ONE_KS"."TEST_ONE_CF";

 key | column1    | value
-----+------------+--------
   6 | test_title | DSE QA
   4 | test_title |    bar
   3 | test_title |    foo
   5 | test_title |   spam
   2 | test_title |  there
   1 | test_title |     hi