Avoiding concurrent structures by manually triggering memory barriers

Question

Background

I have a class whose instances are used to collect and publish data (uses Guava's HashMultimap):

public class DataCollector {

    private final SetMultimap<String, String> valueSetsByLabel
            = HashMultimap.create();

    public void addLabelValue(String label, String value) {
        valueSetsByLabel.put(label, value);
    }

    public Set<String> getLabels() {
        return valueSetsByLabel.keySet();
    }

    public Set<String> getLabelValues(String label) {
        return valueSetsByLabel.get(label);
    }
}

Instances of this class will now be passed between threads, so I need to modify it for thread-safety. Since Guava's Multimap implementations aren't thread-safe, I used a LoadingCache that lazily creates concurrent hash sets instead (see the CacheBuilder and MapMaker javadocs for details):

public class ThreadSafeDataCollector {

    private final LoadingCache<String, Set<String>> valueSetsByLabel
            = CacheBuilder.newBuilder()
            .concurrencyLevel(1)
            .build(new CacheLoader<String, Set<String>>() {
                @Override
                public Set<String> load(String label) {
                    // make and return a concurrent hash set
                    final ConcurrentMap<String, Boolean> map = new MapMaker()
                            .concurrencyLevel(1)
                            .makeMap();
                    return Collections.newSetFromMap(map);
                }
            });

    public void addLabelValue(String label, String value) {
        valueSetsByLabel.getUnchecked(label).add(value);
    }

    public Set<String> getLabels() {
        return valueSetsByLabel.asMap().keySet();
    }

    public Set<String> getLabelValues(String label) {
        return valueSetsByLabel.getUnchecked(label);
    }
}

You'll notice I'm setting the concurrency level for both the loading cache and nested concurrent hash sets to 1 (meaning they each only read from and write to one underlying table). This is because I only expect one thread at a time to read from and write to these objects.

(To quote the concurrencyLevel javadoc, "A value of one permits only one thread to modify the map at a time, but since read operations can proceed concurrently, this still yields higher concurrency than full synchronization.")

Problem

Because I can assume there will only be a single reader/writer at a time, I feel that using many concurrent hash maps per object is heavy-handed. Such structures are meant to handle concurrent reads and writes, and guarantee atomicity of concurrent writes. But in my case atomicity is unimportant - I only need to make sure each thread sees the last thread's changes.

In my search for a more optimal solution I came across this answer by erickson, which says:

Any data that is shared between thread needs a "memory barrier" to ensure its visibility.

[...]

Changes to any member that is declared volatile are visible to all threads. In effect, the write is "flushed" from any cache to main memory, where it can be seen by any thread that accesses main memory.

Now it gets a bit trickier. Any writes made by a thread before that thread writes to a volatile variable are also flushed. Likewise, when a thread reads a volatile variable, its cache is cleared, and subsequent reads may repopulate it from main memory.

[...]

One way to make this work is to have the thread that is populating your shared data structure assign the result to a volatile variable. [...] When other threads access that variable, not only are they guaranteed to get the most recent value for that variable, but also any changes made to the data structure by the thread before it assigned the value to the variable.

(See this InfoQ article for a further explanation of memory barriers.)

The problem erickson is addressing is slightly different in that the data structure in question is fully populated and then assigned to a variable that he suggests be made volatile, whereas my structures are assigned to final variables and gradually populated across multiple threads. But his answer suggests I could use a volatile dummy variable to manually trigger memory barriers:

public class ThreadVisibleDataCollector {

    private final SetMultimap<String, String> valueSetsByLabel
            = HashMultimap.create();

    private volatile boolean dummy;

    private void readMainMemory() {
        if (dummy) { }
    }

    private void writeMainMemory() {
        dummy = false;
    }

    public void addLabelValue(String label, String value) {
        readMainMemory();
        valueSetsByLabel.put(label, value);
        writeMainMemory();
    }

    public Set<String> getLabels() {
        readMainMemory();
        return valueSetsByLabel.keySet();
    }

    public Set<String> getLabelValues(String label) {
        readMainMemory();
        return valueSetsByLabel.get(label);
    }
}

_{Theoretically, I could take this a step further and leave it to the calling code to trigger memory barriers, in order to avoid unnecessary volatile reads and writes between calls on the same thread (potentially by using Unsafe.loadFence and Unsafe.storeFence, which were added in Java 8). But that seems too extreme and hard to maintain.}

Question

Have I drawn the correct conclusions from my reading of erickson's answer (and the JMM) and implemented ThreadVisibleDataCollector correctly? I wasn't able to find examples of using a volatile dummy variable to trigger memory barriers, so I want to verify that this code will behave as expected across architectures.

It seems ThreadVisibleDataCollector would be dumping/reading entire data to memory, rather than just the label-values that are being accessed. — S.D.
@S.D. Wouldn't this also be the case with concurrent hash maps? They use explicit locking, but Lock guarantees the same memory synchronization. — Paul Bellora
True. What I was wondering about was if volatile could be used for each { String label, Set<String> values } object. So, when you accessed a "label", then only that "label" and its set of "values" be synced with main memory, and not all other labels. This could be true with locking as well, i.e. more granular, "label" level locking. — S.D.
@S.D. Interesting, that reminds me of Guava's Striped. I'll consider whether it could be used, though in my particular use case most labels are written by a single thread, and all labels are read by another single thread. — Paul Bellora

Holger Holger · Accepted Answer · 2014-05-26T10:28:06

The thing you are trying to do is called “Premature Optimization”. You don’t have a real performance problem but try to make your entire program very complicated and possibly error prone, without any gain.

The reason why you will never experience any (notable) gain lies in the way how a lock works. You can learn a lot of it by studying the documentation of the class AbstractQueuedSynchronizer.

A Lock is formed around a simple int value with volatile semantics and atomic updates. In the simplest form, i.e. without contention, locking and unlocking consist of a single atomic update of this int variable. Since you claim that you can be sure that there will be only one thread accessing the data at a given time, there will be no contention and the lock state update has similar performance characteristics compared to your volatile boolean attempts but with the difference that the Lock code works reliable and is heavily tested.

The ConcurrentMap approach goes a step further and allows a lock-free read that has the potential to be even more efficient than your volatile read (depending on the actual implementation).

So you are creating a potentially slower and possibly error prone program just because you “feel that using many concurrent hash maps per object is heavy-handed”. The only answer can be: don’t feel. Measure. Or just leave it as is as long as there is no real performance problem.

Avoiding concurrent structures by manually triggering memory barriers

Background

Problem

Question

3 Answers