Kafka Connect S3 Dynamic S3 Folder Structure Creation?

Question

I have manually installed Confluent Kafka Connect S3 using the standalone method and not through Confluent's process or as part of the whole platform.

I can successfully launch the connector from the command line with the command:

./kafka_2.11-2.1.0/bin/connect-standalone.sh connect.properties s3-sink.properties

Topic CDC offsets from AWS MSK can be seen being consumed. No errors are thrown. However, in AWS S3, no folder structure is created for new data and no JSON data is stored.

Questions

Should the connector dynamically create the folder structure as it sees the first JSON packet for a topic?
Other than configuring awscli credentials, connect.properties and s3-sink.properties are there any other settings that need to be set to properly connect to the S3 bucket?
Recommendations on install documentation more comprehensive than the standalone docs on the Confluent website? (linked above)

connect.properties

bootstrap.servers=redacted:9092,redacted:9092,redacted:9092

plugin.path=/plugins/kafka-connect-s3 key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=false value.converter.schemas.enable=false internal.key.converter=org.apache.kafka.connect.json.JsonConverter internal.value.converter=org.apache.kafka.connect.json.JsonConverter internal.key.converter.schemas.enable=false internal.value.converter.schemas.enable=false offset.storage.file.filename=/tmp/connect.offsets

s3-sink.properties

name=s3-sink connector.class=io.confluent.connect.s3.S3SinkConnector tasks.max=1 topics=database_schema_topic1,database_schema_topic2,database_schema_topic3 s3.region=us-east-2 s3.bucket.name=databasekafka s3.part.size=5242880 flush.size=1 storage.class=io.confluent.connect.s3.storage.S3Storage format.class=io.confluent.connect.s3.format.json.JsonFormat schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner schema.compatibility=NONE

Gokul Potluri Gokul Potluri · Accepted Answer · 2019-04-24T05:09:53

Should the connector dynamically create the folder structure as it sees the first JSON packet for a topic? Yes, even you control this path(directory structure) using parameter "topics.dir" and "path.format"

Other than configuring awscli credentials, connect.properties and s3-sink.properties are there any other settings that need to be set to properly connect to the S3 bucket? By default, S3 connector will use Aws credentials (access id and secret key) through environment variables or credentials file. You can change by modifying the parameter "s3.credentials.provider.class". Default value of the parameter is "DefaultAWSCredentialsProviderChain"

Recommendations on install documentation more comprehensive than the standalone docs on the Confluent website? (linked above) I recommend you to go with distributed mode as it provides high availability for your connect cluster and connectors running on it. You can go through below documentation to configure connect cluster in distributed mode. https://docs.confluent.io/current/connect/userguide.html#connect-userguide-dist-worker-config

Kafka Connect S3 Dynamic S3 Folder Structure Creation?

1 Answers