5
votes

We are running kafka hdfs sink connector(version 5.2.1) and needs HDFS data to be partitioned by multiple nested fields.The data in topics is stored as Avro and has nested elements.How ever connect cannot recognize the nested fields and throws an error that the field cannot be found.Below is the connector configuration we are using. Doesn't hdfs sink connect support partitioning by nested fields ?.I can partition by using non nested fields

{
            "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
            "topics.dir": "/projects/test/kafka/logdata/coss",
            "avro.codec": "snappy",
            "flush.size": "200",
            "connect.hdfs.principal": "[email protected]",
            "rotate.interval.ms": "500000",
            "logs.dir": "/projects/test/kafka/tmp/wal/coss4",
            "hdfs.namenode.principal": "hdfs/[email protected]",
            "hadoop.conf.dir": "/etc/hdfs",
            "topics": "test1",
            "connect.hdfs.keytab": "/etc/hdfs-qa/test.keytab",
            "hdfs.url": "hdfs://nameservice1:8020",
            "hdfs.authentication.kerberos": "true",
            "name": "hdfs_connector_v1",
            "key.converter": "org.apache.kafka.connect.storage.StringConverter",
            "value.converter": "io.confluent.connect.avro.AvroConverter",
            "value.converter.schema.registry.url": "http://myschema:8081",
            "partition.field.name": "meta.ID,meta.source,meta.HH",
            "partitioner.class": "io.confluent.connect.storage.partitioner.FieldPartitioner"
  }
1

1 Answers

3
votes

I added nested field support for the TimestampPartitioner, but the FieldPartitioner still has an outstanding PR

https://github.com/confluentinc/kafka-connect-storage-common/pull/67