1
votes

From a server, I was able to connect and get the data out from a remote kafka server topic which has SSL configured.

From GCP, How can I connect to a remote kafka server using Google Dataflow pipeline passing SSL truststore, keystore certificates locations and the Google service account json?

I am using Eclipse plugin for dataflow runner option.

If I point to certificate on GCS, It throws error when certs are pointed to Google storage bucket.


Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

Caused by: org.apache.kafka.common.KafkaException:
 java.io.FileNotFoundException: 
gs:/bucket/folder/truststore-client.jks (No such file or directory)

Followed: Truststore and Google Cloud Dataflow

Updated code pointing SSL truststore, keystore location to local machine's /tmp directory certifcates in case KafkaIO needs to read from file path. It did not throw FileNotFoundError.

Tried running the server Java client code from the GCP account and also using Dataflow - Beam Java pipeline, I get following error.


ssl.truststore.location = <LOCAL MACHINE CERTICATE FILE PATH>
    ssl.truststore.password = [hidden]
    ssl.truststore.type = JKS
    value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer

org.apache.kafka.common.utils.AppInfoParser$AppInfo <init>
INFO: Kafka version : 1.0.0
org.apache.kafka.common.utils.AppInfoParser$AppInfo <init>
INFO: Kafka commitId : aaa7af6d4a11b29d
org.apache.kafka.common.network.SslTransportLayer close
WARNING: Failed to send SSL Close message 
java.io.IOException: Broken pipe

org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:81)
    at org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:153)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:205)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
    at 

org.apache.kafka.common.utils.LogContext$KafkaLogger warn
WARNING: [Consumer clientId=consumer-1, groupId=test-group] Connection to node -2 terminated during authentication. This may indicate that authentication failed due to invalid credentials.

Any suggestions or examples appreciated.

1

1 Answers

0
votes

Git clone or upload the Java Maven Project from local machine to GCP Cloud Shell home directory. Compile the project using the Dataflow runner command on Cloud Shell terminal.

mvn -Pdataflow-runner compile exec:java \
      -Dexec.mainClass=com.packagename.JavaClass \
      -Dexec.args="--project=PROJECT_ID \
      --stagingLocation=gs://BUCKET/PATH/ \
      --tempLocation=gs://BUCKET/temp/ \
      --output=gs://BUCKET/PATH/output \
      --runner=DataflowRunner"

Make sure the runner is set to DataflowRunnner.class and you see the job on Dataflow Console when running it on cloud. DirectRunner executions will not show up on cloud dataflow console.

Place certificates in the resources folder within the Maven project and read files using ClassLoader.

ClassLoader classLoader = getClass().getClassLoader();
File file = new File(classLoader.getResource("keystore.jks").getFile());    
resourcePath.put("keystore.jks",file.getAbsoluteFile().getPath());

Write a ConsumerFactoryFn() to copy over certificates in Dataflow's "/tmp/" directory as described in https://stackoverflow.com/a/53549757/4250322

Use KafkaIO with resource path properties.

Properties props = new Properties();
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, "/tmp/truststore.jks");    
props.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, "/tmp/keystore.jks");
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,  PASSWORD);
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,  PASSWORD); 
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,  PASSWORD);

//other properties
...

PCollection<String> collection = p.apply(KafkaIO.<String, String>read()
                .withBootstrapServers(BOOTSTRAP_SERVERS)
                .withTopic(TOPIC)                                
                .withKeyDeserializer(StringDeserializer.class)
                .withValueDeserializer(StringDeserializer.class)                
                .updateConsumerProperties(props)
                .withConsumerFactoryFn(new ConsumerFactoryFn())
                .withMaxNumRecords(50)
                .withoutMetadata()
        ).apply(Values.<String>create());

// Apply Beam transformations and write to output.