1
votes

I really like how jazelcast jet works with java util streams but when I run those streams I am getting confused how is this running in a distributed way.

    public class IstreamHazelcastDemo {

    public static void main( String[] args ) {

        JetInstance jet = Jet.newJetInstance();
        Jet.newJetInstance();

        IListJet<String> list = jet.getList("list");


        for(int i = 0; i < 50; i++) {
            list.add("test" + i);
        }

         DistributedStream.fromList(list)
                .map(word -> {
                    System.out.println("word: " + word);
                    return word.toUpperCase();
                })
                .collect(DistributedCollectors.toIList("sink"))
                 .forEach(System.out::println);


    }
}

This is a simple example where I create a jet instance first bu running another main program and then run this code so it forms a cluster of 2 nodes. So when I run the above code I was expecting to see the print statement inside map function to be printed in both the nodes since I thought its distributed and will send to multiple nodes. But it always executed the whole flow only in one node. I am trying to think how is this distributed or is it me who is lacking the understanding of hazelcast Jet.

Thanks

1

1 Answers

2
votes

Try this change and you should see a difference

        IMapJet<String, String> map = jet.getMap("map");

        for(int i = 0; i < 50; i++) {
            map.put("test" + i, "test" + i);
        }

         DistributedStream.fromMap(map)
            .map(entry -> {
                System.out.println("word: " + entry.getKey());
                return entry.getKey().toUpperCase();
            })
            .collect(DistributedCollectors.toIList("sink"))
            .forEach(System.out::println);

The difference here is around distribution and partitioning.

A list is distributed, meaning sent to the grid for hosting, but it is still a single object. One grid member holds it, so you'll see a single stream of sysout from the mapper.

A map is distributed, but is also partitioned, meaning the hosting is split across the grid members. If there are two grid members they'll have roughly half of the map content each. So you'll see multiple streams of sysout.