1
votes

I was doing some system checks in the weekend and saw that Cygnus shutdown itself, but there was no error messages in the log file.

Could you please share your ideas with us Francisco?

Many thanks

Starting an ordered shutdown of Cygnus
Stopping sources
Starting an ordered shutdown of Cygnus
Stopping sources
Stopping http-source (lyfecycle state=START)
16/05/29 02:58:02 INFO lifecycle.LifecycleSupervisor: Stopping component: EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{name:http-source,state:START} }
16/05/29 02:58:02 INFO mortbay.log: Stopped SocketConnector@0.0.0.0:5050
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: http-source stopped
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. source.start.time == 1464330902578
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. source.stop.time == 1464490683015
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.append-batch.accepted == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.append-batch.received == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.append.accepted == 0
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.append.received == 0
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.events.accepted == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.events.received == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.open-connection.count == 0
16/05/29 02:58:03 INFO http.HTTPSource: Http source http-source stopped. Metrics: SOURCE:http-source{src.events.accepted=43990, src.events.received=43990, src.append.accepted=0, src.append-batch.accepted=43990, src.open-connection.count=0, src.append-batch.received=43990, src.append.received=0}
All the channels are empty
Stopping channels
Stopping ckan-channel (lyfecycle state=START)
16/05/29 02:58:03 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.MemoryChannel{name: ckan-channel}
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: ckan-channel stopped
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.start.time == 1464330902110
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.stop.time == 1464490683353
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.capacity == 1000
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.current.size == 0
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.event.put.attempt == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.event.put.success == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.event.take.attempt == 74296
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.event.take.success == 43990
Stopping hdfs-channel (lyfecycle state=START)
16/05/29 02:58:03 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.MemoryChannel{name: hdfs-channel}
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: hdfs-channel stopped
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.start.time == 1464330902110
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.stop.time == 1464490683353
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.capacity == 1000
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.current.size == 0
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.event.put.attempt == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.event.put.success == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.event.take.attempt == 67985
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.event.take.success == 43990
Stopping sinks
Stopping ckan-sink (lyfecycle state=START)
16/05/29 02:58:03 INFO lifecycle.LifecycleSupervisor: Stopping component: SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@2c5d7ace counterGroup:{ name:null counters:{runner.backoffs.consecutive=1, runner.backoffs=30324} } }
Stopping hdfs-sink (lyfecycle state=START)
16/05/29 02:58:03 INFO lifecycle.LifecycleSupervisor: Stopping component: SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@2d298123 counterGroup:{ name:null counters:{runner.backoffs.consecutive=1, runner.backoffs=24009} } }
1

1 Answers

1
votes

Cygnus performs an internal check searching for abnormal thread termination, even a ctrl+c keys combination. When that occurs, it shutdowns. You can see the related code here.

Most probably it could be useful to have a flag for enabling/disabling this feature, but for the time being such thing does not exist (I’ll add it for the next version ;)). Alternatively, you can program a monit process in order to detect Cygnus shutdows and automatically restart it again:

Such a monit can be combined with a High Availability (HA) architecture through a specialized software (e.g. Peacemaker, maybe a load balancer is also required) in order to have a pair of active/passive Cygnus'es. This means the active Cygnus works as usual, and the passive one only starts working if some problem is detected in the active one. The specialized software then redirects all the traffic to the passive Cygnus while the active one is restarted (via monit).