2
votes

I want to use Spring with Apache Beam that will run on Google Cloud Data flow Runner. Dataflow job should be able to use Spring Runtime application context while executing the Pipeline steps. I want to use Spring feature in my Apache Beam pipeline for DI and other stuff. After browsing hours on google, I couldn't find any post or documentation which shows Spring integration in Apache Beam. So, if anyone has tried spring with Apache beam, please let me know.

In main class i have initialised the spring application context but it is not available while execution of pipeline steps. I get null pointer exception for autowired beans. I guess the problem is, at runtime context is not available to worker threads.

 public static void main(String[] args) {
    initSpringApplicationContext();

    GcmOptions options = PipelineOptionsFactory.fromArgs(args)
        .withValidation()
        .as(GcmOptions.class);
    Pipeline pipeline = Pipeline.create(options);
    // pipeline definition
}

I want to inject the spring application context to each of the ParDo functions.

1

1 Answers

1
votes

The problem here is that the ApplicationContext is not available on any worker, as the main method is only called when constructing the job and not on any worker machine. Therefore, initSpringApplicationContext is never called on any worker.

I've never tried to use Spring within Apache Beam, but I guess moving initSpringApplicationContext in a static initializer block will lead to your expected result.

public class ApplicationContextHolder {

    private static final ApplicationContext CTX;

    static {
        CTX = initApplicationContext();
    }

    public static ApplicationContext getContext() {
        return CTX;
    }
}

Please be aware that this alone shouldn't be considered as a best practice of using Spring within Apache Beam since it doesn't integrate well in the lifecycle of Apache Beam. For example, when an error happens during the initialization of the application context, it will appear in the first place where the ApplicationContextHolder is used. Therefore, I'd recommend to extract initApplicationContext out of the static initializer block and call it explicitly with regards to Apache Beam's Lifecycle. The setup phase would be a good place for this.