0
votes

​Hello Gurus,

We are starting migration of MySQL data to NOSQLs, we want to migrate all tabular format data to Nested JSON format. There are situations where we will be performing joins to fetch the data(which should be kept in nested array or struct format), so how should we put that data to nested json format. We will need suggestions on this. We are trying to build this solution with the help of hadoop tools like sqoop, hive, pig.

Will you please suggest, the possible ways by which We can achieve this.

We tried with below sqoop import command: sqoop import --username * --password * --connect 'jdbc:mysql://realstart.abc:3306/retail_db' --query 'select orders.order_id,customers.customer_id AS "customers.customer_id", customers.customer_fname AS "customers.customer_fname" FROM orders LEFT JOIN customers ON orders.order_customer_id=customers.customer_id WHERE $CONDITIONS' --hcatalog-database default --hcatalog-table orders --split-by order_id

We created hcat table using below query: hcat -e "create table orders (order_id int, customers struct) row format serde 'org.openx.data.jsonserde.JsonSerDe'"

But not sure how to match the schema of hcat schema in the --query of sqoop import.

Thanks & Regards, Mahendra

1

1 Answers

0
votes

I think the best option would be using hadoop map reduce classes and writing custom job to create nested JSON output or doing dump from mysql db and transform data. You can check docs about custom jobs in hadoop or pig:

I think there is no any "ready to use" option in Sqoop for creating JSON output.