Using moto I was able to mock an EMR cluster:
with moto.mock_emr():
client = boto3.client('emr', region_name='us-east-1')
client.run_job_flow(
Name='my_cluster',
Instances={
'MasterInstanceType': 'c3.xlarge',
'SlaveInstanceType': 'c3.xlarge',
'InstanceCount': 3,
'Placement': {'AvailabilityZone': 'us-east-1a'},
'KeepJobFlowAliveWhenNoSteps': True,
},
VisibleToAllUsers=True,
)
summary = client.list_clusters()
cluster_id = summary["Clusters"][0]["Id"]
res = client.add_job_flow_steps(
JobFlowId=cluster_id,
Steps=[
{
"Name": "foo_step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {"Args": [], "Jar": "command-runner.jar"},
}
],
)
The added step seems to be in a STARTING state all the time. Is it possible to actually submit a Spark job to the mocked cluster and run it there?
I am building a utility that submit jobs to EMR clusters and I want to test it. I want to run a trivial Spark job using this utility and this is where the question is coming from. Note that I'm not interested in a Spark cluster or testing the correctness of the submitted Spark job. I am actually more interested in testing the flow of submitting a job to an EMR and examining the results (that ideally should be persisted on a mocked S3 bucket).