0
votes

We have SCDF deployed in Kubernetes. From the SCDF UI, we are able to create stream using Docker based Source, Processor & Sink apps. However when deploying the stream, the status does not change from "Deploying" and it eventually Fails. We tried setting Limits for CPU & Request properties but it does not help.

App logs show Tomcat is not getting initialized because which /actuator endpoints are not exposed as a result Liveness & Readiness probes fail.

Any thoughts on what the issue could be and how it can be resolved?

SCDF Logs

2019-12-04 06:14:18.264  INFO 1 --- [p-nio-80-exec-7] .s.c.d.s.s.i.AppDeploymentRequestCreator : Creating resource with [docker:215135100958.dkr.ecr.eu-west-1.amazonaws.com/scdf/gce-batch-task-sink:0.0.2-SNAPSHOT] for application [tasklauncherV2]
2019-12-04 06:14:18.266  INFO 1 --- [p-nio-80-exec-7] .s.c.d.s.s.i.AppDeploymentRequestCreator : Creating resource with [docker:215135100958.dkr.ecr.eu-west-1.amazonaws.com/scdf/gce-batch-task-processor:0.0.3-SNAPSHOT] for application [taskprocessorV3]
2019-12-04 06:14:18.268  INFO 1 --- [p-nio-80-exec-7] o.s.c.d.s.stream.SkipperStreamDeployer   : Deploying Stream batchstreamV1 using skipper.
2019-12-04 06:14:18.274  INFO 1 --- [p-nio-80-exec-7] o.s.c.d.s.stream.SkipperStreamDeployer   : Using platform 'default'

Skipper Logs

2019-12-04 06:14:18.307  INFO 1 --- [nio-7577-exec-2] o.s.s.s.DefaultStateMachineService       : Acquiring machine with id batchstreamV1
2019-12-04 06:14:18.307  INFO 1 --- [nio-7577-exec-2] o.s.s.s.DefaultStateMachineService       : Getting new machine from factory with id batchstreamV1
2019-12-04 06:14:18.311  INFO 1 --- [nio-7577-exec-2] o.s.s.support.LifecycleObjectSupport     : started org.springframework.statemachine.trigger.TimerTrigger@76ca974a
2019-12-04 06:14:18.311  INFO 1 --- [nio-7577-exec-2] o.s.s.support.LifecycleObjectSupport     : started org.springframework.statemachine.support.DefaultStateMachineExecutor@1a44aa85
2019-12-04 06:14:18.311  INFO 1 --- [nio-7577-exec-2] o.s.s.support.LifecycleObjectSupport     : started INITIAL UPGRADE UPGRADE_DEPLOY_TARGET_APPS_SUCCEED UPGRADE_DEPLOY_TARGET_APPS_FAILED UPGRADE_DEPLOY_TARGET_APPS UPGRADE_START UPGRADE_DELETE_SOURCE_APPS UPGRADE_CHECK_TARGET_APPS UPGRADE_WAIT_TARGET_APPS UPGRADE_CANCEL UPGRADE_EXIT UPGRADE_CHECK_CHOICE DELETE DELETE_DELETE DELETE_EXIT ROLLBACK ROLLBACK_START ROLLBACK_EXIT_UPGRADE ROLLBACK_CHOICE ROLLBACK_EXIT ROLLBACK_EXIT_INSTALL INSTALL INSTALL_INSTALL INSTALL_EXIT ERROR ERROR_JUNCTION  /  / uuid=581f43a4-02bc-4275-b76e-afc7822c45df / id=batchstreamV1
2019-12-04 06:14:18.329  INFO 1 --- [eTaskExecutor-3] o.s.c.s.s.s.StateMachineConfiguration    : Entering state ObjectState [getIds()=[INITIAL], getClass()=class org.springframework.statemachine.state.ObjectState, hashCode()=1676784192, toString()=AbstractState [id=INITIAL, pseudoState=org.springframework.statemachine.state.DefaultPseudoState@2332ab0f, deferred=[], entryActions=[], exitActions=[org.springframework.cloud.skipper.server.statemachine.ResetVariablesAction@1dbd580], stateActions=[], regions=[], submachine=null]]
2019-12-04 06:14:18.350  INFO 1 --- [eTaskExecutor-4] o.s.c.s.s.s.StateMachineConfiguration    : Entering state StateMachineState [getIds()=[INSTALL], toString()=AbstractState [id=INSTALL, pseudoState=null, deferred=[], entryActions=[], exitActions=[], stateActions=[], regions=[], submachine=INSTALL_INSTALL INSTALL_EXIT  /  / uuid=263e446f-15f4-4913-8ee0-037f17c49ad3 / id=batchstreamV1], getClass()=class org.springframework.statemachine.state.StateMachineState]
2019-12-04 06:14:18.367  INFO 1 --- [eTaskExecutor-4] o.s.c.s.s.s.StateMachineConfiguration    : Entering state ObjectState [getIds()=[INSTALL_INSTALL], getClass()=class org.springframework.statemachine.state.ObjectState, hashCode()=1390837147, toString()=AbstractState [id=INSTALL_INSTALL, pseudoState=org.springframework.statemachine.state.DefaultPseudoState@799fbe4c, deferred=[], entryActions=[org.springframework.cloud.skipper.server.statemachine.InstallInstallAction@6732726], exitActions=[], stateActions=[], regions=[], submachine=null]]
2019-12-04 06:14:18.415  INFO 1 --- [eTaskExecutor-4] o.s.c.d.s.k.KubernetesAppDeployer        : Preparing to run a container from  Docker Resource [docker:215135100958.dkr.ecr.eu-west-1.amazonaws.com/scdf/gce-batch-task-sink:0.0.2-SNAPSHOT]. This may take some time if the image must be downloaded from a remote container registry.
2019-12-04 06:14:18.431  INFO 1 --- [eTaskExecutor-4] o.s.c.d.s.k.DefaultContainerFactory      : Using Docker image: 215135100958.dkr.ecr.eu-west-1.amazonaws.com/scdf/gce-batch-task-sink:0.0.2-SNAPSHOT
2019-12-04 06:14:18.431  INFO 1 --- [eTaskExecutor-4] o.s.c.d.s.k.DefaultContainerFactory      : Using Docker entry point style: exec
2019-12-04 06:14:18.457  INFO 1 --- [eTaskExecutor-4] o.s.c.d.s.k.KubernetesAppDeployer        : Preparing to run a container from  Docker Resource [docker:215135100958.dkr.ecr.eu-west-1.amazonaws.com/scdf/gce-batch-task-processor:0.0.3-SNAPSHOT]. This may take some time if the image must be downloaded from a remote container registry.
2019-12-04 06:14:18.473  INFO 1 --- [eTaskExecutor-4] o.s.c.d.s.k.DefaultContainerFactory      : Using Docker image: 215135100958.dkr.ecr.eu-west-1.amazonaws.com/scdf/gce-batch-task-processor:0.0.3-SNAPSHOT
2019-12-04 06:14:18.473  INFO 1 --- [eTaskExecutor-4] o.s.c.d.s.k.DefaultContainerFactory      : Using Docker entry point style: exec
2019-12-04 06:14:18.579  INFO 1 --- [eTaskExecutor-4] o.s.s.support.LifecycleObjectSupport     : stopped org.springframework.statemachine.support.DefaultStateMachineExecutor@149d3f32
2019-12-04 06:14:18.579  INFO 1 --- [eTaskExecutor-4] o.s.s.support.LifecycleObjectSupport     : stopped INSTALL_INSTALL INSTALL_EXIT  /  / uuid=263e446f-15f4-4913-8ee0-037f17c49ad3 / id=batchstreamV1
2019-12-04 06:14:18.579  INFO 1 --- [eTaskExecutor-4] o.s.c.s.s.s.StateMachineConfiguration    : Entering state ObjectState [getIds()=[INITIAL], getClass()=class org.springframework.statemachine.state.ObjectState, hashCode()=1676784192, toString()=AbstractState [id=INITIAL, pseudoState=org.springframework.statemachine.state.DefaultPseudoState@2332ab0f, deferred=[], entryActions=[], exitActions=[org.springframework.cloud.skipper.server.statemachine.ResetVariablesAction@1dbd580], stateActions=[], regions=[], submachine=null]]
2019-12-04 06:14:18.579  INFO 1 --- [eTaskExecutor-4] o.s.c.s.s.s.SkipperStateMachineService   : setting future value org.springframework.cloud.skipper.domain.Release@4348eec0
2019-12-04 06:14:18.579  INFO 1 --- [eTaskExecutor-4] o.s.s.support.LifecycleObjectSupport     : started org.springframework.statemachine.support.DefaultStateMachineExecutor@149d3f32
2019-12-04 06:14:18.579  INFO 1 --- [eTaskExecutor-4] o.s.s.support.LifecycleObjectSupport     : started INSTALL_INSTALL INSTALL_EXIT  /  / uuid=263e446f-15f4-4913-8ee0-037f17c49ad3 / id=batchstreamV1
1

1 Answers

0
votes

The SCDF/Skipper logs don't really include much information. Nothing reported about something being failed either.

Here are a few standard things to check.

1) If you're running SCDF in Minikube or in a real K8s cluster, please make sure you have enough resource capacity available in the cluster. You can confirm whether or not there are enough CPU or memory available by either describing the K8s nodes or by using tools like Octant.

2) Please note that more the number of services and as well as streaming/task apps that you deploy via SCDF, the more resources you'd require in your K8s cluster. For instance, if you're provisioning Prometheus + Grafana to monitor streaming/task apps in SCDF, they both as a combination need at least 3G of memory. Once again, the resource limit errors will show up in the cluster nodes.

3) Review the streaming/task pod logs. Describe also the streaming/task pods in K8s to see why the readiness/liveness probes are failing - the errors will show up towards the end in the output.