What are the Benefits of Spring Cloud Dataflow?

Question

Based on what I've seen, creating a stream in Spring Cloud Dataflow (SCDF) will deploy the underlying applications, bind the communication service (like RabbitMQ), set the Spring Cloud Stream environment variables, and start the applications. This could all be done manually easily using a cf push command.

Meanwhile, I've been running into some drawbacks with Spring Cloud Dataflow:

SCDF Server is a memory hog on PCF (I have a stream with only 6 applications, and yet I'm needing about 10GB for the server)
No flexibility on application naming, memory, instances, etc. (All the things that you would typically set in the manifest.yml)
Integration with build tools (like Bamboo) are going to require extra work because we have to use the SCDF CLI rather than just the PCF CLI
Existing streams cannot be modified. To do a blue-green deployment, you have to deploy the application manually (binding the services and setting the environment variables manually). And then once a blue-green deployment is done, SCDF shows the stream as Failed, because it doesn't know that one of the underlying applications has changed.
Various errors I've run into, like MySQL Primary Key Constraint errors when trying to redeploy a failed stream

So what am I missing? Why would using Spring Cloud Dataflow be beneficial to just manually deploying the applications?

Sabby Anandan Sabby Anandan · Accepted Answer · 2016-09-30T01:01:08

Based on what I've seen, creating a stream in Spring Cloud Dataflow (SCDF) will deploy the underlying applications, bind the communication service (like RabbitMQ), set the Spring Cloud Stream environment variables, and start the applications. This could all be done manually easily using a cf push command.

Yes - you can individually orchestrate stream applications and there are benefits to that. However, when you try to hand-wire each of the stream applications with the channelName, destination and the binding specific properties, you'd have to deal with more bookkeeping. This all becomes a behind-the-scene chore in Spring Cloud Data Flow's (SCDF) orchestration layer.

Especially, when you've "scaling" or "partitions" involved in your streaming pipeline, you'd have to pay attention to instanceCount, instanceIndex and the related properties. These are automated in SCDF through the DSL semantics, too.

SCDF Server is a memory hog on PCF (I have a stream with only 6 applications, and yet I'm needing about 10GB for the server)

Based on our experiments, this is typically observed when you're in "development" and repeatedly creating > deploying > destroying streams several times in a day. Generally speaking, the server should only require 1G.

There's a general consensus that the JVMs in PCF reporting memory that it isn't really using; this has to do something with java's rt.jar. There are some new kernel changes around 'memory usage reporting' functionality in PCF, so that after the JVM boots up (which uses a good deal of resources) it doesn't continue to report bad data. We are closely tracking this.

That said, we are also profiling the server to make sure there aren't any memory leaks. As-is, the server doesn't have any in-memory state - the minimal metadata state (eg: stream definitions) the server requires is persisted in an RDBMS. Please keep eye on #107 for developments.

No flexibility on application naming, memory, instances, etc. (All the things that you would typically set in the manifest.yml)

It is not clear what you mean by "application naming". If this has to deal with the server name, you can change it easily through your manifest.ymlor by other means. If it has to do with stream-app names, they are automatically deployed with "stream name" as the prefix, so it is easy to identify when you review the apps from CF CLI or Apps-Mgr.

As for the memory and disk usages, you can control at each application level through SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_MEMORY and SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_DISK tokens. More details here.

Integration with build tools (like Bamboo) are going to require extra work because we have to use the SCDF CLI rather than just the PCF CLI

You'd be running the CI builds on the stream/task applications, as they're are part of your development workflow. SCDF simply provides the orchestration mechanics to manage these applications. We are also working on native integration with Netflix's Spinnaker tooling to provide the out-of-the-box experience in near future.

Existing streams cannot be modified. To do a blue-green deployment, you have to deploy the application manually (binding the services and setting the environment variables manually). And then once a blue-green deployment is done, SCDF shows the stream as Failed, because it doesn't know that one of the underlying applications has changed.

You can perform blue-green like rolling upgrades on the apps individually. There's an active w-i-p to adapt to changing stream/task application state in SCDF, too. As an aside, Spinnaker integration would further simplify the rolling upgrades on custom application bits, and SCDF would adapt to dynamic changes - this is the end goal as far as this requirement goes.

Various errors I've run into, like MySQL Primary Key Constraint errors when trying to redeploy a failed stream

We would love to hear your feedback; specifically, please consider reporting these problems in the backlog. Any help on this regard is highly appreciated.

So what am I missing? Why would using Spring Cloud Dataflow be beneficial to just manually deploying the applications?

The architecture section covers the general capabilities. If you're to have numerous stream or task applications (like any other microservice setup), you'd need a central orchestration tooling to manage them in the cloud setting. SCDF provides DSL, REST-API, Dashboard, Flo and of course the security layer that comes out-of-the-box. Interoperability between streams and tasks is another important requirement for use-cases involving closed-loop analytics - there's DSL tooling around this. When Spinnaker integration becomes the first-class citizen, we foresee having an end-to-end continuous delivery over data pipelines. Lastly, the SCDF-tile for Cloud Foundry would interoperate with Spring Cloud Services to further automate the provisioning aspect along with comprehensive security coverage.

Hope this helps.

What are the Benefits of Spring Cloud Dataflow?

1 Answers