1
votes

We are upgrading our Data pipeline version from 3.3.2 to 5.8, so those bootstrap actions on old AMI release have changed to be setup using configuration and specifying them under classification / property definition.

So my Json looks like below

  {
            "enableDebugging": "true",
            "taskInstanceBidPrice": "1",
            "terminateAfter": "2 Hours",
            "name": "ExportCluster",
            "taskInstanceType": "m1.xlarge",
            "schedule": {
                "ref": "Default"
            },
            "emrLogUri": "s3://emr-script-logs/",
            "coreInstanceType": "m1.xlarge",
            "coreInstanceCount": "1",
            "taskInstanceCount": "4",
            "masterInstanceType": "m3.xlarge",
            "keyPair": "XXXX",
            "applications": ["hadoop","hive", "tez"],
            "subnetId": "XXXXX",
            "logUri": "s3://pipelinedata/XXX",
            "releaseLabel": "emr-5.8.0",
            "type": "EmrCluster",
            "id": "EmrClusterWithNewEMRVersion",
            "configuration": [
                { "ref": "configureEmrHiveSite" }
            ]
        },
        {
            "myComment": "This object configures hive-site xml.",
            "name": "HiveSite Configuration",
            "type": "HiveSiteConfiguration",
            "id": "configureEmrHiveSite",
            "classification": "hive-site",
            "property": [
                {"ref": "hive-exec-compress-output" }
            ]
        },
        {
            "myComment": "This object sets a hive-site configuration 
             property value.",
            "name":"hive-exec-compress-output",
            "type": "Property",
            "id": "hive-exec-compress-output",
            "key": "hive.exec.compress.output",
            "value": "true"
        }
    ],
    "parameters": []

With the above Json file it gets loaded into Data Pipeline but throws an error saying

Object:HiveSite Configuration
ERROR: 'HiveSiteConfiguration'
Object:ExportCluster
ERROR: 'configuration' values must be of type 'null'. Found values of type 'null'

I am not sure what this really means and could you please let me know if i am specifying this correctly which i think i am according to http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

1
Were you able to successfully upgrade to 5.x? I specifically have a question about this step, without changing the default configuration. stackoverflow.com/questions/47858108/…user1322092

1 Answers

0
votes

The below block should have the name as "EMR Configuration" only then its recognized correctly by the AWS Data pipeline and the Hive-site.xml is being set accordingly.

   {
        "myComment": "This object configures hive-site xml.",
        "name": "EMR Configuration",
        "type": "EmrConfiguration",
        "id": "configureEmrHiveSite",
        "classification": "hive-site",
        "property": [
            {"ref": "hive-exec-compress-output" }
        ]
    },