3
votes

I am new to AWS GLUE and just want to solve a particular problem. Currently i have only Glue service available only and no EC2 node no lambda. I am trying to run a AWS spark glue job from Aws python shell glue job . is that possible to run a AWS glue python shell job as a wrapper and call multiple time the same AWS glue spark job with different parameters. I tried to run the below code snipet but getting boto Ecxeption error in logs.

import boto3
glue = boto3.client(service_name='glue', region_name='us-east-1',
              endpoint_url='https://glue.us-east-1.amazonaws.com')  
myNewJobRun = glue.start_job_run(JobName='WHICH I CREATED IN CONSOLE')

In the above code i have already created a job in console and wants to hit the job from AWS python shell glue job.

Below i want to get the status of job if it is running then it will wait for some time and then again check the status.

status = glue.get_job_run(JobName=myJob['Name'], RunId=JobRun['JobRunId'])

Can someone advice for share any code sample for reference

Thanks Pradeep

2

2 Answers

6
votes

Following is a sample code which keeps checking the job status till the job is SUCCEEDED and raises an exception if any of the error state is observed:

import boto3
client = boto3.client(service_name='glue', region_name='us-east-1',
          endpoint_url='https://glue.us-east-1.amazonaws.com') 
response = client.start_job_run(JobName='WHICH U CREATED IN CONSOLE')
status = client.get_job_run(JobName=job_name, RunId=response['JobRunId'])

if status:
    state = status['JobRun']['JobRunState']
    while state not in ['SUCCEEDED']:
        time.sleep(30)
        status = client.get_job_run(JobName=job_name, RunId=response['JobRunId'])
        state = status['JobRun']['JobRunState']
        if state in ['STOPPED', 'FAILED', 'TIMEOUT']:
            raise Exception('Failed to execute glue job: ' + status['JobRun']['ErrorMessage'] + '. State is : ' + state)

You can modify the conditions and sleep time as per your requirement.

1
votes

What is error?

You may have to add arguments to start_job_run()

response = glue.start_job_run( JobName=jobName, Arguments=arguments, AllocatedCapacity=dpus)

status = glue.get_job_run(JobName=jobName, RunId=response['JobRunId'])