0
votes

I created a simple step function as follows : Start -> Start EMR cluster & submit job -> End

I want to find out a mechanism to identify whether my spark step completed successfully or not?

I am able to start EMR cluster and attach a spark job to it, which successfully completes and terminates the cluster. Followed steps in this link : Creating AWS EMR cluster with spark step using lambda function fails with "Local file does not exist"

Now, I am looking to get the status, th ejob poller will get me information whether the EMR cluster created successfully or not. I am looking at ways how I can find out Spark job status

from botocore.vendored import requests 
import boto3
import json 
def lambda_handler(event, context): 
    conn = boto3.client("emr") 
    cluster_id = conn.run_job_flow(   
        Name='xyz',   
        ServiceRole='xyz',   
        JobFlowRole='asd',   
        VisibleToAllUsers=True,   
        LogUri='<location>',   
        ReleaseLabel='emr-5.16.0',   
        Instances={     
            'Ec2SubnetId': 'xyz',     
            'InstanceGroups': [         
                {           
                'Name': 'Master',           
                'Market': 'ON_DEMAND',           
                'InstanceRole': 'MASTER',           
                'InstanceType': 'm4.xlarge',           
                'InstanceCount': 1,         
                }
            ],   
            'KeepJobFlowAliveWhenNoSteps': False,   
            'TerminationProtected': False,  
        },  
        Applications=[
            {    
            'Name': 'Spark'
            },
            {
            'Name': 'Hadoop' 
            }
        ],  
            Steps=[{     'Name': "mystep",     
            'ActionOnFailure': 'TERMINATE_CLUSTER',     
            'HadoopJarStep': {       
                'Jar': 'jar',       
                'Args' : [
                        <insert args> , jar, mainclass
                    ]     
                }   

            }] 

    )
    return cluster_id


1

1 Answers

1
votes

You can use cli or sdk to list all steps for the cluster and then describe particular step to get its status.