0
votes

We are trying to connect to datastore services through a dataflow job written in java but we are facing issues due to datastore SDK error.

We are running the job with directrunner on local machine using eclipse.

Code:

import java.net.SocketTimeoutException;

import org.apache.beam.sdk.options.PipelineOptions;

import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;

public class StarterPipeline {

    public interface StarterPipelineOption extends PipelineOptions {

    }

    @SuppressWarnings("serial")
    public static void main(String[] args) throws SocketTimeoutException {

        Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

    }
}

Error:

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in path at index 45: https://datastore.googleapis.com/v1/projects/<!DOCTYPE html>
at com.google.datastore.v1.client.DatastoreFactory.validateUrl(DatastoreFactory.java:122)
at com.google.datastore.v1.client.DatastoreFactory.buildProjectEndpoint(DatastoreFactory.java:108)
at com.google.datastore.v1.client.DatastoreFactory.newRemoteRpc(DatastoreFactory.java:115)
at com.google.datastore.v1.client.DatastoreFactory.create(DatastoreFactory.java:65)
at com.google.cloud.datastore.spi.v1.HttpDatastoreRpc.<init>(HttpDatastoreRpc.java:71)
at com.google.cloud.datastore.DatastoreOptions$DefaultDatastoreRpcFactory.create(DatastoreOptions.java:61)
at com.google.cloud.datastore.DatastoreOptions$DefaultDatastoreRpcFactory.create(DatastoreOptions.java:55)
at com.google.cloud.ServiceOptions.getRpc(ServiceOptions.java:512)
at com.google.cloud.datastore.DatastoreOptions.getDatastoreRpcV1(DatastoreOptions.java:179)
at com.google.cloud.datastore.DatastoreImpl.<init>(DatastoreImpl.java:56)
at com.google.cloud.datastore.DatastoreOptions$DefaultDatastoreFactory.create(DatastoreOptions.java:51)
at com.google.cloud.datastore.DatastoreOptions$DefaultDatastoreFactory.create(DatastoreOptions.java:45)
at com.google.cloud.ServiceOptions.getService(ServiceOptions.java:499)
at purplle.datapipeline.StarterPipeline.main(StarterPipeline.java:234)
Caused by: java.net.URISyntaxException: Illegal character in path at index 45: https://datastore.googleapis.com/v1/projects/<!DOCTYPE html>
at java.net.URI$Parser.fail(Unknown Source)
at java.net.URI$Parser.checkChars(Unknown Source)
at java.net.URI$Parser.parseHierarchical(Unknown Source)
at java.net.URI$Parser.parse(Unknown Source)
at java.net.URI.<init>(Unknown Source)
at com.google.datastore.v1.client.DatastoreFactory.validateUrl(DatastoreFactory.java:120)
... 13 more

We are using below versions of SDKs which i believe are upto date.

<!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-storage -->
<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-storage</artifactId>
    <version>1.37.1</version>
</dependency>

<!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-datastore -->
<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-datastore</artifactId>
    <version>1.37.1</version>
</dependency>

<dependency>
    <groupId>com.google.cloud.dataflow</groupId>
    <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
    <version>2.5.0</version>
</dependency>

While going across google for solution we found below thread which states this issue has been fixed in February but im facing this issue.

https://github.com/GoogleCloudPlatform/google-cloud-java/issues/2440

1
I included the same dependencies but not getting any issue. Did you included any other dependencies in the project ?SANN3

1 Answers

0
votes

Had contacted google cloud support and they said to run the pipeline locally while developing we have to manually provide project id through environmental variable.

Below is the response from google support.

The error you are getting is from 1 and is thrown when "the server or credentials weren't provided". In your case, you didn't specify credentials when constructing the client

Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

the client library tried to look for credentials via the environment variable GOOGLE_APPLICATION_CREDENTIALS, but it failed as the job is not running on Compute Engine, Kubernetes Engine, App Engine, or Cloud Functions. 3 and 4 -Run in Compute/App Engine]. I believe your code should work with DataflowRunner. Please confirm if that is the case.

To run the code locally, you can create and obtain service account credentials manually 5, For more detailed example, please check 4 or 6.


So i downloaded the service account credentials while connecting to datastore API and it worked!