Spring cloud data flow instance count

Question

When can this properties be used? spring.cloud.stream.instanceCount, spring.cloud.stream.instanceindex

Consider this pipeline, rabbitsrc ->filter->transform->httpclient->rabbit sink

Need to publish 360,000 messages, currently takes 22 min.

Settings 1)prefetch on rabbit src is set to 50. 2)httpclient is used to post message to local rest endpoint exposed by one running instance of spring boot microservice.

Problem: Bottleneck seems between httpclient -> rabbit sink part of the pipeline as I see throughput of 250/s incoming message rate on rabbit sink queue. The earlier part of the pipeline seems very fast. Its the transfer of messages from http-client to rabbit that is taking long. So ideally want to split the load from transform step between multiple instances of spring boot microservices. That way we could achieve 500/s messages on to destination rabbit queue.

However don't know where to make configure changes in the pipeline, ie which app do I need to increase the instance count, where to set the instanceindex etc.

Sabby Anandan Sabby Anandan · Accepted Answer · 2018-10-01T17:06:07

You can read more about instanceCount and instanceIndex from the SCSt's reference guide.

Here's a demonstration of how these properties can be used.

If you don't care for partitioned streaming scenario and that you're interested only in parallelizing the "compute", you'd merely just scale the number of instances.

Unlike cloud-runtime specific implementations, in SCDF's Local-server, however, there's no "runtime" scaling support, but you can deploy the stream with app-specific instance counts.

For instance, in the example above, we are deploying a stream named foo with deployer.appender.count=3. SCDF will deploy 3-instances of appender App, and by default, all the instances will compete for messages. Hence, they will be in an automatic load-balanced state to parallelize compute operation.

All that said, just by scaling the consumer instances, you may not still reach the desired throughput. You may have to tweak the prefetch, concurrency, and other Rabbit consumer configurations to maximize for your environment. Alternatively, you could use Rabbit's performance benchmark tooling to measure its raw throughput and compare it with business logic embedded as SCSt/SCDF workload.

UPDATE

Also, when you use SCDF with Skipper, you'd have the visibility into all the automation behind streaming property overrides priovided by SCDF.

Here's an example output for appender app.

dataflow:>stream manifest --name foo

"apiVersion": "skipper.spring.io/v1"
"kind": "SpringCloudDeployerApplication"
"metadata":
  "name": "appender"
"spec":
  "resource": "https://github.com/sabbyanandan/partitions/raw/master/jars/appender"
  "resourceMetadata": "https://github.com/sabbyanandan/partitions/raw/master/jars/appender:jar:metadata:0.0.1-SNAPSHOT"
  "version": "0.0.1-SNAPSHOT"
  "applicationProperties":
    "spring.metrics.export.triggers.application.includes": "integration**"
    "spring.cloud.dataflow.stream.app.label": "appender"
    "spring.cloud.stream.instanceCount": "3"
    "spring.cloud.stream.metrics.key": "foo.appender.${spring.cloud.application.guid}"
    "spring.cloud.stream.bindings.input.group": "foo"
    "spring.cloud.stream.metrics.properties": "spring.application.name,spring.application.index,spring.cloud.application.*,spring.cloud.dataflow.*"
    "spring.cloud.stream.bindings.output.producer.requiredGroups": "foo"
    "spring.cloud.dataflow.stream.name": "foo"
    "spring.cloud.stream.bindings.output.destination": "foo.appender"
    "spring.cloud.dataflow.stream.app.type": "processor"
    "spring.cloud.stream.bindings.input.consumer.partitioned": "true"
    "spring.cloud.stream.bindings.input.destination": "foo.fruits"
  "deploymentProperties":
    "spring.cloud.deployer.indexed": "true"
    "spring.cloud.deployer.count": "3"
    "spring.cloud.deployer.group": "foo"

Spring cloud data flow instance count

1 Answers