Background
I have a class whose instances are used to collect and publish data (uses Guava's HashMultimap
):
public class DataCollector {
private final SetMultimap<String, String> valueSetsByLabel
= HashMultimap.create();
public void addLabelValue(String label, String value) {
valueSetsByLabel.put(label, value);
}
public Set<String> getLabels() {
return valueSetsByLabel.keySet();
}
public Set<String> getLabelValues(String label) {
return valueSetsByLabel.get(label);
}
}
Instances of this class will now be passed between threads, so I need to modify it for thread-safety. Since Guava's Multimap
implementations aren't thread-safe, I used a LoadingCache
that lazily creates concurrent hash sets instead (see the CacheBuilder
and MapMaker
javadocs for details):
public class ThreadSafeDataCollector {
private final LoadingCache<String, Set<String>> valueSetsByLabel
= CacheBuilder.newBuilder()
.concurrencyLevel(1)
.build(new CacheLoader<String, Set<String>>() {
@Override
public Set<String> load(String label) {
// make and return a concurrent hash set
final ConcurrentMap<String, Boolean> map = new MapMaker()
.concurrencyLevel(1)
.makeMap();
return Collections.newSetFromMap(map);
}
});
public void addLabelValue(String label, String value) {
valueSetsByLabel.getUnchecked(label).add(value);
}
public Set<String> getLabels() {
return valueSetsByLabel.asMap().keySet();
}
public Set<String> getLabelValues(String label) {
return valueSetsByLabel.getUnchecked(label);
}
}
You'll notice I'm setting the concurrency level for both the loading cache and nested concurrent hash sets to 1
(meaning they each only read from and write to one underlying table). This is because I only expect one thread at a time to read from and write to these objects.
(To quote the concurrencyLevel
javadoc, "A value of one permits only one thread to modify the map at a time, but since read operations can proceed concurrently, this still yields higher concurrency than full synchronization.")
Problem
Because I can assume there will only be a single reader/writer at a time, I feel that using many concurrent hash maps per object is heavy-handed. Such structures are meant to handle concurrent reads and writes, and guarantee atomicity of concurrent writes. But in my case atomicity is unimportant - I only need to make sure each thread sees the last thread's changes.
In my search for a more optimal solution I came across this answer by erickson, which says:
Any data that is shared between thread needs a "memory barrier" to ensure its visibility.
[...]
Changes to any member that is declared
volatile
are visible to all threads. In effect, the write is "flushed" from any cache to main memory, where it can be seen by any thread that accesses main memory.Now it gets a bit trickier. Any writes made by a thread before that thread writes to a volatile variable are also flushed. Likewise, when a thread reads a volatile variable, its cache is cleared, and subsequent reads may repopulate it from main memory.
[...]
One way to make this work is to have the thread that is populating your shared data structure assign the result to a
volatile
variable. [...] When other threads access that variable, not only are they guaranteed to get the most recent value for that variable, but also any changes made to the data structure by the thread before it assigned the value to the variable.
(See this InfoQ article for a further explanation of memory barriers.)
The problem erickson is addressing is slightly different in that the data structure in question is fully populated and then assigned to a variable that he suggests be made volatile
, whereas my structures are assigned to final
variables and gradually populated across multiple threads. But his answer suggests I could use a volatile
dummy variable to manually trigger memory barriers:
public class ThreadVisibleDataCollector {
private final SetMultimap<String, String> valueSetsByLabel
= HashMultimap.create();
private volatile boolean dummy;
private void readMainMemory() {
if (dummy) { }
}
private void writeMainMemory() {
dummy = false;
}
public void addLabelValue(String label, String value) {
readMainMemory();
valueSetsByLabel.put(label, value);
writeMainMemory();
}
public Set<String> getLabels() {
readMainMemory();
return valueSetsByLabel.keySet();
}
public Set<String> getLabelValues(String label) {
readMainMemory();
return valueSetsByLabel.get(label);
}
}
Theoretically, I could take this a step further and leave it to the calling code to trigger memory barriers, in order to avoid unnecessary volatile reads and writes between calls on the same thread (potentially by using Unsafe.loadFence
and Unsafe.storeFence
, which were added in Java 8). But that seems too extreme and hard to maintain.
Question
Have I drawn the correct conclusions from my reading of erickson's answer (and the JMM) and implemented ThreadVisibleDataCollector
correctly? I wasn't able to find examples of using a volatile
dummy variable to trigger memory barriers, so I want to verify that this code will behave as expected across architectures.
ThreadVisibleDataCollector
would be dumping/reading entire data to memory, rather than just the label-values that are being accessed. – S.D.Lock
guarantees the same memory synchronization. – Paul Belloravolatile
could be used for each { String label, Set<String> values } object. So, when you accessed a "label", then only that "label" and its set of "values" be synced with main memory, and not all other labels. This could be true with locking as well, i.e. more granular, "label" level locking. – S.D.Striped
. I'll consider whether it could be used, though in my particular use case most labels are written by a single thread, and all labels are read by another single thread. – Paul Bellora