0
votes

I'm trying to configure hive-site.xml to have MySQL outside of the local MySQL on EMR. How can I modify an existing cluster configuration to add hive-site.xml from S3?

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-dev-create-metastore-outside.html

2

2 Answers

0
votes

I'm not sure what you mean by "add hive-site.xml from S3". If you're just looking to get the file off of S3 and into your conf directory, you can do that with the aws-cli while logged into your cluster,

aws s3 cp s3://path/to/hive-site.xml ~/conf

More detailed instructions on migrating an existing EMR cluster's Hive MetaStore to an external service like RDS can be found below

--

Setting up an existing EMR cluster to look at an outside MySQL database is very easy. First, you'll need to dump your MySQL database that's running on your Master node to keep your existing schema information. Assuming you've a large amount of ephemeral storage and your database socket is located at /var/lib/mysql/mysql.sock:

mysqldump -S /var/lig/mysql/mysql.sock hive > /media/ephemeral0/backup.sql

Then you'll need to import this into your outside MySQL instance. If this is in RDS, you'll first need to create the hive database and then import your data into it:

mysql -h rds_host -P 3306 -u rds_master_user -prds_password mysql -e "create database hive"

and,

mysql -h rds_host -P 3306 -u rds_master_user -prds_password hive < /media/ephemeral0/backup.sql

Next up, you'll need to create a user for hive to use. Log into your outside MySQL instance and execute the following statement (with a better username and password):

grant all privileges on hive.* to 'some_hive_user'@'%' identified by 'some_password'; flush privileges;

Lastly, create/make the same changes to hive-site.xml as outlined in the documentation you cited (filling in the proper host, user, and password information) and restart your MetaStore. To restart your MetaStore, kill the already running MetaStore process and start a new one.

ps aux | grep MetaStore
kill pid
hive --service metastore&
0
votes

If you are in EMR 3.x, you can just use the method in the link you provide(using bootstrap action).

If you are in ERM 4.x+, then that bootstrap action is not available. You could

  1. either add the custom properies thru EMR --configuration with a xxx.json file. The benefit is straightforward. The con is all the config properties you added this way will be on the aws web console which is not ideal if you have things like metastore database credentials there since you are using external metastore.
  2. or you add a Step after cluster is up to overwrite your hive.xml from S3, then another Step to execute sudo reload hive-server2 to restart hive server to get the new config.