1
votes

I would like to implement a variation on the "Map of Sets" collection that will be constantly accessed by multiple threads. I am wondering whether the synchronization I am doing is sufficient to guarantee that no issues will manifest.

So given the following code, where Map, HashMap, and Set are the Java implementations, and Key and Value are some arbitrary Objects:

public class MapOfSets {
    private Map<Key, Set<Value>> map;

    public MapOfLists() {
        map = Collections.synchronizedMap(new HashMap<Key, Set<Value>());
    }

    //adds value to the set mapped to key
    public void add(Key key, Value value) {
        Set<Value> old = map.get(key);

        //if no previous set exists on this key, create it and add value to it
        if(old == null) {
            old = new Set<Value>();
            old.add(value);
            map.put(old);
        }
        //otherwise simply insert the value to the existing set
        else {
            old.add(value);
        }
    }

    //similar to add
    public void remove(Key key, Value value) {...}

    //perform some operation on all elements in the set mapped to key
    public void foo(Key key) {
        Set<Value> set = map.get(key);

        for(Value v : set)
            v.bar();
    }
}

The idea here is that because I've synchronized the Map itself, the get() and put() method should be atomic right? So there should be no need to do additional synchronization on the Map or the Sets contained in it. So will this work?

Alternatively, would the above code be advantageous over another possible synchronization solution:

public class MapOfSets {
    private Map<Key, Set<Value>> map;

    public MapOfLists() {
        map = new HashMap<Key, Set<Value>();
    }

    public synchronized void add(Key key, Value value) {
        Set<Value> old = map.get(key);

        //if no previous set exists on this key, create it and add value to it
        if(old == null) {
            old = new Set<Value>();
            old.add(value);
            map.put(old);
        }
        //otherwise simply insert the value to the existing set
        else {
            old.add(value);
        }
    }

    //similar to add
    public synchronized void remove(Key key, Value value) {...}

    //perform some operation on all elements in the set mapped to key
    public synchronized void foo(Key key) {
        Set<Value> set = map.get(key);

        for(Value v : set)
            v.bar();
    }
}

Where I leave the data structures unsynchronized but synchronize all the possible public methods instead. So which ones will work, and which one is better?

4
old = new Set<Value>(); - Set is an interface, what class are you using? There might be problems if you use non-concurrent classamit
Does it matter? I was considering using the CopyOnWriteArraySet, which is guaranteed thread-safe, but doesn't synchronization of either the map or accessors already guarantee that only one thread will be looking at a Set at a time?donnyton
Why does the first implementation guarantees only one thread will be looking at Set at a time? I don't think this statement is true, and you might have concurrent issues when 2 threads try to add/del element from the set concurrently, if the set is not thread-safe.amit
I think I see the issues with the first implementation. What about the second?donnyton

4 Answers

3
votes

The first implementation you posted is not thread safe. Consider what happens when the add method is accessed by two concurrent threads with the same key:

  1. thread A executes line 1 of the method, and gets a null reference because no item with the given key is present
  2. thread B executes line 1 of the method, and gets a null reference because no item with the given key is present — this will happen after A returns from the first call, as the map is synchronized
  3. thread A evaluates the if condition to false
  4. thread B evaluates the if condition to false

From that point on, the two threads will carry on with execution of the true branch of the if statement, and you will lose one of the two value objects.

The second variant of the method you posted looks safer.


However, if you can use third party libraries, I would suggest you to check out Google Guava, as they offer concurrent multimaps (docs).

3
votes

The second one is correct, but the first one isn't.

Think about it a minute, and suppose two threads are calling add() in parallel. Here's what could occur:

  • Thread 1 calls add("foo", bar");
  • Thread 2 calls add("foo", baz");
  • Thread 1 gets the set for "foo" : null
  • Thread 2 gets the set for "foo" : null
  • Thread 1 creates a new set and adds "bar" in it
  • Thread 2 creates a new set and adds "baz" in it
  • Thread 1 puts its set in the map
  • Thread 2 puts its set in the map

At the end of the story, the map contains one value for "foo" instead of two.

Synchronizing the map makes sure that its internal state is coherent, and that each method you call on the map is thread-safe. but it doesn't make the get-then-put operation atomic.

Consider using one of Guava's SetMultiMap implementations, which does everything for you. Wrap it into a call to Multimaps.synchronizedSetMultimap(SetMultimap) to make it thread-safe.

3
votes

Your second implementation will work, but it holds locks for longer than it needs to (an inevitable problem with using synchronized methods rather than synchronized blocks), which will reduce concurrency. If you find that the limit on concurrency here is a bottleneck, you could shrink the locked regions a bit.

Alternatively, you could use some of the lock-free collections providded by java.util.concurrent. Here's my attempt at that; this isn't tested, and it requires Key to be comparable, but it should not perform any locking ever:

public class MapOfSets {
    private final ConcurrentMap<Key, Set<Value>> map;

    public MapOfSets() {
        map = new ConcurrentSkipListMap<Key, Set<Value>>();
    }

    private static ThreadLocal<Set<Value>> freshSets = new ThreadLocal<Set<Value>>() {
        @Override
        protected Set<Value> initialValue() {
            return new ConcurrentSkipListSet<Value>();
        }
    };

    public void add(Key key, Value value) {
        Set<Value> freshSet = freshSets.get();
        Set<Value> set = map.putIfAbsent(key, freshSet);
        if (set == null) {
            set = freshSet;
            freshSets.remove();
        }
        set.add(value);
    }

    public void remove(Key key, Value value) {
        Set<Value> set = map.get(key);
        if (set != null) {
            set.remove(value);
        }
    }

    //perform some operation on all elements in the set mapped to key
    public void foo(Key key) {
        Set<Value> set = map.get(key);
        if (set != null) {
            for (Value v: set) {
                v.bar();
            }
        }
    }
}
0
votes

For your Map implementation you could just use a ConcurrentHashMap - You wouldn't have to worry about ensuring thread safety for access, whether it's input or retrieval, as the implemenation takes care of that for you.

And if you really want to use a Set, you could call

Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>()) 

on your ConcurrentHashMap.