I need to run a Dataflow pipeline on a regular basis. The FAQ for Dataflow states the following:
You can automate pipeline execution through Google App Engine or custom (CRON) job processes on GCE. Future releases of the SDK will support command line options for finer grained control over job management.
I've tried to run a very simple pipeline from my Java app, using this code:
public class MyAnalyticsServlet extends HttpServlet {
@Override
public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
resp.setContentType("text/plain");
if (req.getRequestURI().equals("/dataflow/test")) {
DataflowPipelineOptions options = PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);
options.setProject("redacted");
options.setRunner(DataflowPipelineRunner.class);
Pipeline p = Pipeline.create(options);
p.apply(TextIO.Read.named("TestInput").from("gs://redacted/test/in.txt"))
.apply(new TestTransform())
.apply(TextIO.Write.named("TestOutput")
.to("gs://redacted/test")
.withNumShards(0));
p.run();
} else {
resp.setStatus(404);
resp.getWriter().println("Not Found");
return;
}
resp.getWriter().println("OK");
}
}
I get the following error:
java.lang.IllegalArgumentException: Methods [setRunner(Class), getRunner()] on [com.google.cloud.dataflow.sdk.options.PipelineOptions] do not conform to being bean properties.
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
at com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory.validateClass(PipelineOptionsFactory.java:1059)
...
Any ideas?