4
votes

I need to run a Dataflow pipeline on a regular basis. The FAQ for Dataflow states the following:

You can automate pipeline execution through Google App Engine or custom (CRON) job processes on GCE. Future releases of the SDK will support command line options for finer grained control over job management.

I've tried to run a very simple pipeline from my Java app, using this code:

public class MyAnalyticsServlet extends HttpServlet {
    @Override
    public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {
        resp.setContentType("text/plain");
        if (req.getRequestURI().equals("/dataflow/test")) {
            DataflowPipelineOptions options = PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);
            options.setProject("redacted");
            options.setRunner(DataflowPipelineRunner.class);
            Pipeline p = Pipeline.create(options);
            p.apply(TextIO.Read.named("TestInput").from("gs://redacted/test/in.txt"))
                    .apply(new TestTransform())
                    .apply(TextIO.Write.named("TestOutput")
                            .to("gs://redacted/test")
                            .withNumShards(0));
            p.run();
        } else {
            resp.setStatus(404);
            resp.getWriter().println("Not Found");
            return;
        }
        resp.getWriter().println("OK");
    }
}

I get the following error:

java.lang.IllegalArgumentException: Methods [setRunner(Class), getRunner()] on [com.google.cloud.dataflow.sdk.options.PipelineOptions] do not conform to being bean properties.
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
    at com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory.validateClass(PipelineOptionsFactory.java:1059)
    ...

Any ideas?

1
I have a simple pipeline that runs wordcount within AppEngine, I using the latest Dataflow SDK (search.maven.org/…). Are you building from Github or using the latest version from Maven? - Lukasz Cwik
what version of the SDK are you using? - Graham Polley
Can anyone else confirm being able to launch a Dataflow from within AppEngine? - Lukasz Cwik
This answer goes into detail about running Dataflow on Google AppEngine. - Sam McVeety

1 Answers

1
votes

I know you're using Java; however, this example, which walks through how to do this from a GAE Python Flex app, might be helpful: http://amygdala.github.io/dataflow/app_engine/2017/04/14/gae_dataflow.html