1
votes

AWS Stepfunctions recently added EMR integration, which is cool, but i couldn't find a way to pass a variable from step functions into the addstep args. For example i would like to pass "$.dayid" variable into "Parameters">"Step">"HadoopJarStep">Args. Similar to "ClusterId.$": "$.ClusterId" (this cluster id variable works).

{
    "Step_One": {
    "Type": "Task",
    "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
    "Parameters": {
        "ClusterId.$": "$.ClusterId",
        "Step": {
            "Name": "The first step",
            "ActionOnFailure": "CONTINUE",
            "HadoopJarStep": {
                "Jar": "command-runner.jar",
                "Args": [
                    "hive-script",
                    "--run-hive-script",
                    "--args",
                    "-f",
                    "s3://<region>.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
                    "-d",
                    "INPUT=s3://<region>.elasticmapreduce.samples",
                    "-d",
                    "OUTPUT=s3://<mybucket>/MyHiveQueryResults/$.dayid"
                ]
            }
        }
    },
    "End": true
}
1

1 Answers

2
votes

Parameters allow you to define key-value pairs, so as the value for the "Args" key is an array, you won't be able to dynamically reference a specific element in the array, you would need to reference the whole array instead. For example "Args.$": "$.Input.ArgsArray". With that said, you also won't be able to reference substitute a value inside a string like you are trying to do in "OUTPUT=s3:///MyHiveQueryResults/$.dayid"

So for your use-case the best way to achieve this would be to add a pre-processing state, before calling this state. In the pre-processing state I would recommend you call a Lambda function to construct the string "OUTPUT=s3:///MyHiveQueryResults/$.dayid" as well as the full Array you send to Args.

{
    "StartAt": "Pre-Process",
    "States": {
        "Pre-Process": {
            "Type": "Task",
            "Resource": "<Lambda function to generate the string OUTPUT=s3://<mybucket>/MyHiveQueryResults/$.dayid and output the Args array>",
            "Next": "Step_One"
        },
        "Step_One": {
            "Type": "Task",
            "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
            "Parameters": {
                "ClusterId.$": "$.ClusterId",
                "Step": {
                    "Name": "The first step",
                    "ActionOnFailure": "CONTINUE",
                    "HadoopJarStep": {
                        "Jar": "command-runner.jar",
                        "Args.$": "$.ArgsGeneratedByPreProcessingState"
                    }
                }
            },
            "End": true
        }
    }
}