How to fix unexpected hazelcast client shutdown

Question

I'm using hazelcast jet to perform aggregations on stream data. Problem is, that hazelcast cliend shutsdown unexpectedly.

I've implemented simple pipeline with remote map source and then the result is simply sinked.

// init pipeline
Pipeline p = Pipeline.create();

// configure source
BatchSource remoteBatchMap = Sources.remoteMap(<my remote map>, <my config>);

// add source and sink to pipeline
p.drawFrom(remoteBatchMap).drainTo(Sinks.map(SINK_MAP_NAME));

On a client side, output is as expected for the first cca. 30 seconds. Then shutdown happens, and further on, those printed values freezes. Ok, that is logical, as it has been shut down. But, how to prevent shutdown?

2019-07-25 14:22:18,214 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 254/41254
2019-07-25 14:22:19,359 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 262/41254
2019-07-25 14:22:20,496 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 269/41259
2019-07-25 14:22:20,786 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] HazelcastClient 3.12.1 (20190611 - 0a0ee66) is SHUTTING_DOWN
2019-07-25 14:22:20,791 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] Removed connection to endpoint: [192.168.41.3]:5701, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/192.168.26.78:64217->/192.168.41.3:5701}, remoteEndpoint=[192.168.41.3]:5701, lastReadTime=2019-07-25 14:22:19.980, lastWriteTime=2019-07-25 14:22:19.855, closedTime=2019-07-25 14:22:20.789, connected server version=3.12.1}
2019-07-25 14:22:20,794 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] Removed connection to endpoint: [192.168.41.4]:5701, connection: ClientConnection{alive=false, connectionId=2, channel=NioChannel{/192.168.26.78:64218->/192.168.41.4:5701}, remoteEndpoint=[192.168.41.4]:5701, lastReadTime=2019-07-25 14:22:20.525, lastWriteTime=2019-07-25 14:22:20.376, closedTime=2019-07-25 14:22:20.793, connected server version=3.12.1}
2019-07-25 14:22:20,797 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] HazelcastClient 3.12.1 (20190611 - 0a0ee66) is SHUTDOWN
2019-07-25 14:22:20,802 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] [192.168.1.66]:5701 [jet] [3.1] Execution of job '8dc4-d1e2-df66-a444', execution 9622-ba74-b907-150c completed in 42,335 ms
2019-07-25 14:22:21,635 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259
2019-07-25 14:22:22,771 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259
2019-07-25 14:22:23,909 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259

On server side it says that connection is closed by the other side - so, my client side:

2019-07-25 14:22:21.909  INFO 21375 --- [hz.betex.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection    : [192.168.41.3]:5701 [app] [3.1] Connection[id=159, /192.168.41.3:5701->192.168.26.78/192.168.26.78:64217, qualifier=null, endpoint=[192.168.26.78]:64217, alive=false, type=JAVA_CLIENT] closed. Reason: Connection closed by the other side
2019-07-25 14:22:21.910  INFO 21375 --- [hz.betex.event-14] c.h.client.impl.ClientEndpointManager    : [192.168.41.3]:5701 [app] [3.1] Destroying ClientEndpoint{connection=Connection[id=159, /192.168.41.3:5701->192.168.26.78/192.168.26.78:64217, qualifier=null, endpoint=[192.168.26.78]:64217, alive=false, type=JAVA_CLIENT], principal='ClientPrincipal{uuid='c5286586-cbe2-4c84-8e74-4c2f1f59310a', ownerUuid='ebce22c4-ed31-4ccf-9808-b19005dc55f8'}, ownerConnection=true, authenticated=true, clientVersion=3.12.1, creationTime=1564057300564, latest statistics=null}

I'd be very happy to get some orientation and ideas, where to look for a problem.

The client is shutdown when job is finished, so this is normal. Are you updating the map during the test? Jet map source only works consistently if the map is not being modified during operation. — Can Gencer
See the javadoc as it explains what can happen if map is being updated at the same time. If your map is being modified, it's better to use the mapJournal source. — Can Gencer
I see ... I mixed up batch processing with windowing in stream processing. — Aliman

Mike Yawn Mike Yawn · Accepted Answer · 2019-07-25T13:11:20

If you haven't already wrapped the code in a try/catch, I'd try that. I remember running into something similar but can't recall the root cause; it may have been a ClassCastException or something serialization-related. There wasn't any clue in the output but once I added the try/catch and dumped a stack trace the issue was obvious.

How to fix unexpected hazelcast client shutdown

2 Answers