1
votes

When creating a data pipeline via API / CLI that creates an EmrCluster, I can specify multiple steps using an array structure:

{ "objects" : [
  { "id" : "myEmrCluster",
  "terminateAfter" : "1 hours",
  "schedule" : {"ref":"theSchedule"}
  "step" : ["some.jar,-param1,val1", "someOther.jar,-foo,bar"] },
  { "id" : "theSchedule", "period":"1 days" }
] }

I can call put-pipeline-definition referencing the file above to create a number of steps for the EMR cluster.

Now if I want to create the pipeline using CloudFormation, I can use the PipelineObjects property in a AWS::DataPipeline::Pipeline resource type to configure the pipeline. However, pipeline objects can only be of type StringValue or RefValue. How can i create an array pipeline object field?

Here's a corresponding cloudformation template:

"Resources" : {
    "MyEMRCluster" : {
        "Type" : "AWS::DataPipeline::Pipeline",
        "Properties" : {
            "Name" : "MyETLJobs",
            "Activate" : "true",
            "PipelineObjects" : [
                {

                    "Id" : "myEmrCluster",
                    "Fields" : [
                        { "Key" : "terminateAfter","StringValue":"1 hours" },
                        { "Key" : "schedule","RefValue" : "theSchedule" },
                        { "Key" : "step","StringValue" : "some.jar,-param1,val1" }
                    ]
                },
                {
                    "Id" : "theSchedule",
                    "Fields" : [
                        { "Key" : "period","StringValue":"1 days" }
                    ]
                }
             ]
         }
    }
}

With the above template, step is a StringValue, equivalent to:

"step" : "some.jar,-param1,val1"

and not an array like the desired config.

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-datapipeline-pipeline-pipelineobjects-fields.html shows only StringValue and RefValue are valid keys - is it possible to create an array of steps via CloudFormation??

Thanks in advance.

1

1 Answers

1
votes

Ah, I'm not sure where I saw that steps could be configured as an array - the documentation has no mention about that - instead, it specifies that to have multiple steps, multiple step entries should be used.

            {

                "Id" : "myEmrCluster",
                "Fields" : [
                    { "Key" : "terminateAfter","StringValue":"1 hours" },
                    { "Key" : "schedule","RefValue" : "theSchedule" },
                    { "Key" : "step","StringValue" : "some.jar,-param1,val1" },
                    { "Key" : "step","StringValue" : "someOther.jar,-foo,bar" }
                ]
            }
       }