I am trying to run my Beam code on Spark for a POC. I am running the application on Google Cloud Dataproc for testing. It is a very simple test to read from a PubSub topic and write the message to a bucket on Google Cloud Storage. Dataproc cluster has the right version for Spark and is enabled to access other GCP API's.
I tried with FileIO aswell but that did not work either. I tried publishing to another PubSub topic instead of writing and that worked but that is not my use case. I tried printing before writing with TextIO and that confirmed that I can read messages from PubSub.
Here is the pipeline:
PCollection<String> messages = pipeline
.apply(PubsubIO.readStrings().fromSubscription(sub))
.apply(Window.into(FixedWindows.of(Duration.standardSeconds(1))));
messages.apply(TextIO.write().to("gs://...").withNumShards(1).withWindowedWrites());
pipeline.run();
I don't see any logs on Dataproc job output. No errors or anything at all. There is no file on bucket either.