2
votes

I can successfully do an incremental import from MySQL to HDFS using Sqoop by

sqoop job -create JOBNAME ... --incremental append --check-column id --last-value LAST
sqoop job -exec JOBNAME

That finishes with log messages like

INFO tool.ImportTool: Saving incremental import state to the metastore
INFO tool.ImportTool: Updated data for job: JOBNAME

And inspecting the job reveals that incremental.last.value was updated correctly.

If I attempt the same procedure, but add "--hive-import" to the definition of my job, it will execute successfully, but won't update incremental.last.value.

Is this a bug? Intended behavior? Does anyone have a procedure for incrementally importing data from MySQL and making it available via Hive?

I basically want my Hadoop cluster to be a read slave of my MySQL database, for fast analysis. If there's some other solution than Hive (Pig would be fine), I'd love to hear that too.

3
I'm just in the process of learning about hive and haven't gotten to actually setting it up yet. I'm really really interested in the answer to this question. - nemo

3 Answers

0
votes

The option --hive import is used to create defined structure of the table on HDFS using mapreduce jobs.Moreover,the data being read on to Hive is Read Schema!!! .Which means the data is not actually imported on to it unless the query is executed.So everytime ,you run a file,the query is executed on the schema newly(freshly) on the table in Hive.So it doesnt store the last incremental value.

Every query on the Hive schema is treated to be independent as it is run at execution time and doesnt store old results.

0
votes

You can also create external hive table manually as these will be only one time activity and go on importing incremental data as you are doing.

0
votes

we can get last value using below script.

--check_colum colname=id -- incremental append or lastmodified --last_value $(HIVE_HOME /bin/hive -e'select max(id) from tablename')