I am trying to create an EMR cluster by writing a AWS lambda function using python boto library.However I am able to create the cluster but I want to use "AWS Glue Data Catalog for table metadata" so that I can use spark to directly read from the glue data catalog.While creating the EMR cluster through AWS user interface I usually check in a checkbox ("Use AWS Glue Data Catalog for table metadata") which solves my purpose.But I am not getting any clue how can I achieve the same through boto library.
Below is the python code which I am using to create the EMR cluster.
try:
connection = boto3.client(
'emr',
region_name='xxx'
)
cluster_id = connection.run_job_flow(
Name='EMR-LogProcessing',
LogUri='s3://somepath/',
ReleaseLabel='emr-5.21.0',
Applications=[
{
'Name': 'Spark'
},
],
Instances={
'InstanceGroups': [
{
'Name': "MasterNode",
'Market': 'SPOT',
'InstanceRole': 'MASTER',
'BidPrice': 'xxx',
'InstanceType': 'm3.xlarge',
'InstanceCount': 1,
},
{
'Name': "SlaveNode",
'Market': 'SPOT',
'InstanceRole': 'CORE',
'BidPrice': 'xxx',
'InstanceType': 'm3.xlarge',
'InstanceCount': 2,
}
],
'Ec2KeyName': 'xxx',
'KeepJobFlowAliveWhenNoSteps': True,
'TerminationProtected': False
},
VisibleToAllUsers=True,
JobFlowRole='EMR_EC2_DefaultRole',
ServiceRole='EMR_DefaultRole',
Tags=[
{
'Key': 'Name',
'Value': 'EMR-LogProcessing',
},
{
'Key': 'env',
'Value': 'dev',
},
],
)
print('cluster created with the step...', cluster_id['JobFlowId'])
except Exception as exp:
logger.info("Exception Occured in createEMRcluster!!! %s", str(exp))
I am not finding any clue how can I achieve it.Please help.