1
votes

I have an existing application that uses Hazelcast for tracking cluster membership and for distributed task execution. I'm thinking that Jet could be useful for adding analytics on top of the existing application, and I'm trying to figure out how best to layer Jet on top of what we already have.

So my first question is, how should run Jet on top of our existing Hazelcast configuration? Do I have to run Jet separately, or replace our existing Hazelcast configuration with Jet (since Jet does expose the HazelcastInstance.)

My second question is, I see lots of examples using IMap and IList, but I'm not seeing anything that uses topics as a source (I also don't see this as an option from the Sources builder). My initial thought on using Jet was to emit events (io perf data, http request data) from our existing code to a topic and then have Jet process that topic, generate analytics from that data, and then push that to an IMap. Is this the wrong approach? Should I be using some other structure to push these events into Jet? I saw that I can make my own custom Source where I could do this, but I felt that I must be going down the wrong path if I was pursuing this given there wasn't one already provided by the library for this specific purpose.

1

1 Answers

7
votes

You can either upgrade your current Hazelcast IMDG cluster to a Jet cluster and run your legacy application alongside Jet jobs. This setup is simpler to deploy and operate. Starting an extra cluster for Jet is also perfectly fine. The advantage of it is isolation (cluster lifecycle, failures etc.). Just be aware that you can't combine IMDG 3.x with Jet 4.x clusters.

Use IMap with Journal to connect two jobs or to ingest data into the cluster. It's simplest fault-tolerant option that works OOTB. Jet's data source must be replayable - if the Job fails, it goes back to last state snapshot rewinding the data source offset respectively.

Topic can be used (via Source Builder) but it won't be fault-tolerant (some messages might get lost). Jet achieves fault tolerance by snapshotting the job regularly. In the case of failure, latest snapshot is restored and the data following the snapshot is replayed. Unlike journal, topic consumer can't replay the data using an offset.