3
votes

Question

What is the S3 extended destination configuration and where in the AWS documentation explains clearly what it is for?

As the name suggests, it must be about S3 destination. However, the S3 destination part of the AWS document has no mention.

If there are articles or blogs which have clear explanation, please provide the pointers.

I have been looking for a clue in the documentations as below, but as often with the AWS documentations, it is not clear. It looks partly related with input record conversion or record processing.

resource "aws_kinesis_firehose_delivery_stream" "extended_s3_stream" {
  name        = "terraform-kinesis-firehose-extended-s3-test-stream"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn   = "${aws_iam_role.firehose_role.arn}"
    bucket_arn = "${aws_s3_bucket.bucket.arn}"

    processing_configuration {
      enabled = "true"

      processors {
        type = "Lambda"

        parameters {
          parameter_name  = "LambdaArn"
          parameter_value = "${aws_lambda_function.lambda_processor.arn}:$LATEST"
        }
      }
    }
  }
}
4

4 Answers

3
votes

The Terraform documentation is the best at showing the difference between S3 and Extended S3 destinations: https://www.terraform.io/docs/providers/aws/r/kinesis_firehose_delivery_stream.html

S3 Extended inherits the S3 destination configuration parameters with extra ones such as data_format_conversion_configuration or the error_output_prefix

2
votes

I am afraid the Kinesis Firehose document is so poorly written, I wonder how people can figure out how to use Firehose just from the documentation.

It looks originally the firehose simply relays data to the S3 bucket and there is no built-in transformation mechanism and the S3 destination configuration has no processing configuration as in AWS::KinesisFirehose::DeliveryStream S3DestinationConfiguration.

Then as in Amazon Kinesis Firehose Data Transformation with AWS Lambda, a mechanism to transform records was introduced seemingly around early 2017, so AWS::KinesisFirehose::DeliveryStream ExtendedS3DestinationConfiguration has been added.

Apparently peopel struggles to find the way of how to configure:

Who can figure it out by just reading the AWS document?

Firehose extended S3 configurations for lambda transformation

Could not figure out from the AWS document, but it looks the configurations required are below after looking into the actual implementations in the Internet.

enter image description here


Update

As per the suggestion by Kevin Eid.

The extended_s3_configuration object supports the same fields from s3_configuration as well as the following:

    data_format_conversion_configuration - (Optional) Nested argument for the serializer, deserializer, and schema for converting data from the JSON format to the Parquet or ORC format before writing it to Amazon S3. More details given below.
    error_output_prefix - (Optional) Prefix added to failed records before writing them to S3. This prefix appears immediately following the bucket name.
    processing_configuration - (Optional) The data processing configuration. More details are given below.
    s3_backup_mode - (Optional) The Amazon S3 backup mode. Valid values are Disabled and Enabled. Default value is Disabled.
    s3_backup_configuration - (Optional) The configuration for backup in Amazon S3. Required if s3_backup_mode is Enabled. Supports the same fields as s3_configuration object.

The s3_configuration is still there due to compatibility or legacy reason only I believe, hence only need to use extended_s3_configuration but the AWS documentation does not explain properly. It is such a pity the AWS documentation does not serve as the source of truth.

1
votes

First of The ExtendedS3DestinationConfiguration property type configures an Amazon S3 destination for an Amazon Kinesis Data Firehose delivery stream. See: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-kinesisfirehose-deliverystream-extendeds3destinationconfiguration.html

Thanks.

0
votes

This little screenshot shows new components in ExtendedS3DestinationConfiguration as compared to S3DestinationConfiguration:

enter image description here

Also, what is and how the extended s3 configuration is defined are shown in API documentation:

{
  "RoleARN": "string",
  "BucketARN": "string",
  "Prefix": "string",
  "ErrorOutputPrefix": "string",
  "BufferingHints": {
    "SizeInMBs": integer,
    "IntervalInSeconds": integer
  },
  "CompressionFormat": "UNCOMPRESSED"|"GZIP"|"ZIP"|"Snappy",
  "EncryptionConfiguration": {
    "NoEncryptionConfig": "NoEncryption",
    "KMSEncryptionConfig": {
      "AWSKMSKeyARN": "string"
    }
  },
  "CloudWatchLoggingOptions": {
    "Enabled": true|false,
    "LogGroupName": "string",
    "LogStreamName": "string"
  },
  "ProcessingConfiguration": {
    "Enabled": true|false,
    "Processors": [
      {
        "Type": "Lambda",
        "Parameters": [
          {
            "ParameterName": "LambdaArn"|"NumberOfRetries"|"RoleArn"|"BufferSizeInMBs"|"BufferIntervalInSeconds",
            "ParameterValue": "string"
          }
          ...
        ]
      }
      ...
    ]
  },
  "S3BackupMode": "Disabled"|"Enabled",
  "S3BackupUpdate": {
    "RoleARN": "string",
    "BucketARN": "string",
    "Prefix": "string",
    "ErrorOutputPrefix": "string",
    "BufferingHints": {
      "SizeInMBs": integer,
      "IntervalInSeconds": integer
    },
    "CompressionFormat": "UNCOMPRESSED"|"GZIP"|"ZIP"|"Snappy",
    "EncryptionConfiguration": {
      "NoEncryptionConfig": "NoEncryption",
      "KMSEncryptionConfig": {
        "AWSKMSKeyARN": "string"
      }
    },
    "CloudWatchLoggingOptions": {
      "Enabled": true|false,
      "LogGroupName": "string",
      "LogStreamName": "string"
    }
  },
  "DataFormatConversionConfiguration": {
    "SchemaConfiguration": {
      "RoleARN": "string",
      "CatalogId": "string",
      "DatabaseName": "string",
      "TableName": "string",
      "Region": "string",
      "VersionId": "string"
    },
    "InputFormatConfiguration": {
      "Deserializer": {
        "OpenXJsonSerDe": {
          "ConvertDotsInJsonKeysToUnderscores": true|false,
          "CaseInsensitive": true|false,
          "ColumnToJsonKeyMappings": {"string": "string"
            ...}
        },
        "HiveJsonSerDe": {
          "TimestampFormats": ["string", ...]
        }
      }
    },
    "OutputFormatConfiguration": {
      "Serializer": {
        "ParquetSerDe": {
          "BlockSizeBytes": integer,
          "PageSizeBytes": integer,
          "Compression": "UNCOMPRESSED"|"GZIP"|"SNAPPY",
          "EnableDictionaryCompression": true|false,
          "MaxPaddingBytes": integer,
          "WriterVersion": "V1"|"V2"
        },
        "OrcSerDe": {
          "StripeSizeBytes": integer,
          "BlockSizeBytes": integer,
          "RowIndexStride": integer,
          "EnablePadding": true|false,
          "PaddingTolerance": double,
          "Compression": "NONE"|"ZLIB"|"SNAPPY",
          "BloomFilterColumns": ["string", ...],
          "BloomFilterFalsePositiveProbability": double,
          "DictionaryKeyThreshold": double,
          "FormatVersion": "V0_11"|"V0_12"
        }
      }
    },
    "Enabled": true|false
  }
}