AWS DMS SQL Server to s3 parquet - change-data-type transformation rule and 'Parquet type not supported: INT32 (UINT_8)'

Question

We use AWS DMS to dump SQL Server DBs into S3 as parquet files. Idea is to run some analytics with spark over parquets. When a full load is complete then it's not possible to read parquets since they have UINT fields in the schema. Spark declines to read them with Parquet type not supported: INT32 (UINT_8). We use transformation rules to overwrite data type of UINT columns. But it looks like they are not picked up by DMS engine. Why?

There are number of rules like "convert unit to int" see below (mind UINT1 is 1 byte unsigned DMS DataTypes):

{
  "rule-type": "transformation",
  "rule-id": "7",
  "rule-name": "uintToInt",
  "rule-action": "change-data-type",
  "rule-target": "column",
  "object-locator": {
    "schema-name": "%",
    "table-name": "%",
    "column-name": "%",
    "data-type": "uint1"
  },
  "data-type": {
    "type": "int4"
  }
}

S3 DataFormat=parquet;ParquetVersion=parquet_2_0 and DMS Engine version is 3.3.2

However still getting parquet schemas with uint. See below:

id: int32
name: string
value: string
status: uint8

Attempt to read such parquet using spark gives me

org.apache.spark.sql.AnalysisException: Parquet type not supported: INT32 (UINT_8);
    at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotSupported$1(ParquetSchemaConverter.scala:100)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:136)

Why the DMS transformation rule is not triggered?

Is it limitation I hit? Changes to the source table structure during full load are not supported docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html — Anton

jcunhafonte jcunhafonte · Accepted Answer · 2020-06-22T16:15:27

Transforming the data directly from UINT to INT on DMS fixes this issue. Your mapping rules should look like:

{
"rules": [
    ...
    {
        "rule-type": "transformation",
        "rule-id": "2",
        "rule-name": "unit1-to-int1",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "acessa",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint1"
        },
        "data-type": {
            "type": "int1"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "3",
        "rule-name": "unit2-to-int2",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "acessa",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint2"
        },
        "data-type": {
            "type": "int2"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "4",
        "rule-name": "unit4-to-int4",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "acessa",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint4"
        },
        "data-type": {
            "type": "int4"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "5",
        "rule-name": "unit8-to-int8",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "acessa",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint8"
        },
        "data-type": {
            "type": "int8"
        }
    }
]}

Documentation: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html#CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations

AWS DMS SQL Server to s3 parquet - change-data-type transformation rule and 'Parquet type not supported: INT32 (UINT_8)'

2 Answers