1
votes

I have one mysql table which takes 20-30 mins for responding to a simple select query due to its huge size. I was thinking if I can import it into hadoop for processing. Would it be a good idea? If I will use hive to perform the same query, there will be any performance gain?? If yes, then how should I import my table into hadoop? Since table has composite primary key, sqoop is not an option. One more thing, Should I use HBase? Which approach will be better?

Thanks

3
even if your database is big you will be selecting only 10,20 or 100 record at a time ,also you also may be using indexing - Arun Killu
@Arun yes i am using indexing and yes i select 10, 20 or 100 record at a time but it takes 20-30 mins. - cool.ernest.7

3 Answers

3
votes

Hive is batch oriented, consider using Impala which is used for interactive adhoc querying and is faster than Hive. Hive is much older than Impala, so it has lot of documentation and help can also be found easily.

There is also Apache Drill, but it is still in the incubator phase. There is also Amazon RedShift, but it's in a limited preview phase as of now. Then there is Google BigQuery. There are multiple options for getting data in Hadoop and processing data on top of it.

There is no fixed solution/choice for all the problems. Based on your requirement, the appropriate tool has to be chosen.

1
votes

In case of importing the table,even if it has a composite primary key, I think you can use sqoop's sequential import. With only a single mapper (using -m 1) , you won't require a 'split-by' column anymore.

0
votes

@cool.ernest.7 Maybe naive, but why cannot you import table with composite index on it? btw how big is you database?