I'm trying to configure hive-site.xml
to have MySQL outside of the local MySQL on EMR. How can I modify an existing cluster configuration to add hive-site.xml
from S3?
2 Answers
I'm not sure what you mean by "add hive-site.xml from S3". If you're just looking to get the file off of S3 and into your conf directory, you can do that with the aws-cli while logged into your cluster,
aws s3 cp s3://path/to/hive-site.xml ~/conf
More detailed instructions on migrating an existing EMR cluster's Hive MetaStore to an external service like RDS can be found below
--
Setting up an existing EMR cluster to look at an outside MySQL database is very easy. First, you'll need to dump your MySQL database that's running on your Master node to keep your existing schema information. Assuming you've a large amount of ephemeral storage and your database socket is located at /var/lib/mysql/mysql.sock
:
mysqldump -S /var/lig/mysql/mysql.sock hive > /media/ephemeral0/backup.sql
Then you'll need to import this into your outside MySQL instance. If this is in RDS, you'll first need to create the hive
database and then import your data into it:
mysql -h rds_host -P 3306 -u rds_master_user -prds_password mysql -e "create database hive"
and,
mysql -h rds_host -P 3306 -u rds_master_user -prds_password hive < /media/ephemeral0/backup.sql
Next up, you'll need to create a user for hive to use. Log into your outside MySQL instance and execute the following statement (with a better username and password):
grant all privileges on hive.* to 'some_hive_user'@'%' identified by 'some_password'; flush privileges;
Lastly, create/make the same changes to hive-site.xml
as outlined in the documentation you cited (filling in the proper host, user, and password information) and restart your MetaStore. To restart your MetaStore, kill the already running MetaStore process and start a new one.
ps aux | grep MetaStore
kill pid
hive --service metastore&
If you are in EMR 3.x
, you can just use the method in the link you provide(using bootstrap action).
If you are in ERM 4.x+
, then that bootstrap action is not available. You could
- either add the custom properies thru
EMR --configuration
with axxx.json
file. The benefit is straightforward. The con is all the config properties you added this way will be on theaws web console
which is not ideal if you have things like metastore database credentials there since you are using external metastore. - or you add a
Step
after cluster is up to overwrite your hive.xml from S3, then anotherStep
to executesudo reload hive-server2
to restart hive server to get the new config.