150
votes

There is a case where a map will be constructed, and once it is initialized, it will never be modified again. It will however, be accessed (via get(key) only) from multiple threads. Is it safe to use a java.util.HashMap in this way?

(Currently, I'm happily using a java.util.concurrent.ConcurrentHashMap, and have no measured need to improve performance, but am simply curious if a simple HashMap would suffice. Hence, this question is not "Which one should I use?" nor is it a performance question. Rather, the question is "Would it be safe?")

12
Many answers here are correct regarding mutual exclusion from running threads, but incorrect regarding memory updates. I've voted up/down accordingly, but there are still many incorrect answers with positive votes.Heath Borders
@Heath Borders, if the instance a was statically initialized unmodifiable HashMap, it should be safe for concurrent read (as other threads couldn't have missed updates as there were no updates), right?kaqqao
If it's statically initialized and never modified outside of the static block, then it might be ok because all static initialization is synchronized by the ClassLoader. That's worth a separate question on its own. I'd still explicitly synchronize it and profile to verify that it was causing real performance issues.Heath Borders
@HeathBorders - what do you mean by "memory updates"? The JVM is a formal model which defines things like visibility, atomicity, happens-before relationships, but doesn't use terms like "memory updates". You should clarify, preferably using terminology from the JLS.BeeOnRope
@Dave - I assume you aren't still looking for answer after 8 years, but for the record, the key confusion in nearly all the answers is that they focus on the actions you take on the map object. You've already explained that you never modify the object, so that is all irrelevant. The only potential "gotcha" then is how you publish the reference to the Map, which you didn't explain. If you don't do it safely, it is not safe. If you do it safely, it is. Details in my answer.BeeOnRope

12 Answers

61
votes

Your idiom is safe if and only if the reference to the HashMap is safely published. Rather than anything relating the internals of HashMap itself, safe publication deals with how the constructing thread makes the reference to the map visible to other threads.

Basically, the only possible race here is between the construction of the HashMap and any reading threads that may access it before it is fully constructed. Most of the discussion is about what happens to the state of the map object, but this is irrelevant since you never modify it - so the only interesting part is how the HashMap reference is published.

For example, imagine you publish the map like this:

class SomeClass {
   public static HashMap<Object, Object> MAP;

   public synchronized static setMap(HashMap<Object, Object> m) {
     MAP = m;
   }
}

... and at some point setMap() is called with a map, and other threads are using SomeClass.MAP to access the map, and check for null like this:

HashMap<Object,Object> map = SomeClass.MAP;
if (map != null) {
  .. use the map
} else {
  .. some default behavior
}

This is not safe even though it probably appears as though it is. The problem is that there is no happens-before relationship between the set of SomeObject.MAP and the subsequent read on another thread, so the reading thread is free to see a partially constructed map. This can pretty much do anything and even in practice it does things like put the reading thread into an infinite loop.

To safely publish the map, you need to establish a happens-before relationship between the writing of the reference to the HashMap (i.e., the publication) and the subsequent readers of that reference (i.e., the consumption). Conveniently, there are only a few easy-to-remember ways to accomplish that[1]:

  1. Exchange the reference through a properly locked field (JLS 17.4.5)
  2. Use static initializer to do the initializing stores (JLS 12.4)
  3. Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
  4. Initialize the value into a final field (JLS 17.5).

The ones most interesting for your scenario are (2), (3) and (4). In particular, (3) applies directly to the code I have above: if you transform the declaration of MAP to:

public static volatile HashMap<Object, Object> MAP;

then everything is kosher: readers who see a non-null value necessarily have a happens-before relationship with the store to MAP and hence see all the stores associated with the map initialization.

The other methods change the semantics of your method, since both (2) (using the static initalizer) and (4) (using final) imply that you cannot set MAP dynamically at runtime. If you don't need to do that, then just declare MAP as a static final HashMap<> and you are guaranteed safe publication.

In practice, the rules are simple for safe access to "never-modified objects":

If you are publishing an object which is not inherently immutable (as in all fields declared final) and:

  • You already can create the object that will be assigned at the moment of declarationa: just use a final field (including static final for static members).
  • You want to assign the object later, after the reference is already visible: use a volatile fieldb.

That's it!

In practice, it is very efficient. The use of a static final field, for example, allows the JVM to assume the value is unchanged for the life of the program and optimize it heavily. The use of a final member field allows most architectures to read the field in a way equivalent to a normal field read and doesn't inhibit further optimizationsc.

Finally, the use of volatile does have some impact: no hardware barrier is needed on many architectures (such as x86, specifically those that don't allow reads to pass reads), but some optimization and reordering may not occur at compile time - but this effect is generally small. In exchange, you actually get more than what you asked for - not only can you safely publish one HashMap, you can store as many more not-modified HashMaps as you want to the same reference and be assured that all readers will see a safely published map.

For more gory details, refer to Shipilev or this FAQ by Manson and Goetz.


[1] Directly quoting from shipilev.


a That sounds complicated, but what I mean is that you can assign the reference at construction time - either at the declaration point or in the constructor (member fields) or static initializer (static fields).

b Optionally, you can use a synchronized method to get/set, or an AtomicReference or something, but we're talking about the minimum work you can do.

c Some architectures with very weak memory models (I'm looking at you, Alpha) may require some type of read barrier before a final read - but these are very rare today.

70
votes

Jeremy Manson, the god when it comes to the Java Memory Model, has a three part blog on this topic - because in essence you are asking the question "Is it safe to access an immutable HashMap" - the answer to that is yes. But you must answer the predicate to that question which is - "Is my HashMap immutable". The answer might surprise you - Java has a relatively complicated set of rules to determine immutability.

For more info on the topic, read Jeremy's blog posts:

Part 1 on Immutability in Java: http://jeremymanson.blogspot.com/2008/04/immutability-in-java.html

Part 2 on Immutability in Java: http://jeremymanson.blogspot.com/2008/07/immutability-in-java-part-2.html

Part 3 on Immutability in Java: http://jeremymanson.blogspot.com/2008/07/immutability-in-java-part-3.html

38
votes

The reads are safe from a synchronization standpoint but not a memory standpoint. This is something that is widely misunderstood among Java developers including here on Stackoverflow. (Observe the rating of this answer for proof.)

If you have other threads running, they may not see an updated copy of the HashMap if there is no memory write out of the current thread. Memory writes occur through the use of the synchronized or volatile keywords, or through uses of some java concurrency constructs.

See Brian Goetz's article on the new Java Memory Model for details.

11
votes

After a bit more looking, I found this in the java doc (emphasis mine):

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)

This seems to imply that it will be safe, assuming the converse of the statement there is true.

9
votes

One note is that under some circumstances, a get() from an unsynchronized HashMap can cause an infinite loop. This can occur if a concurrent put() causes a rehash of the Map.

http://lightbody.net/blog/2005/07/hashmapget_can_cause_an_infini.html

8
votes

There is an important twist though. It's safe to access the map, but in general it's not guaranteed that all threads will see exactly the same state (and thus values) of the HashMap. This might happen on multiprocessor systems where the modifications to the HashMap done by one thread (e.g., the one that populated it) can sit in that CPU's cache and won't be seen by threads running on other CPUs, until a memory fence operation is performed ensuring cache coherence. The Java Language Specification is explicit on this one: the solution is to acquire a lock (synchronized (...)) which emits a memory fence operation. So, if you are sure that after populating the HashMap each of the threads acquires ANY lock, then it's OK from that point on to access the HashMap from any thread until the HashMap is modified again.

5
votes

According to http://www.ibm.com/developerworks/java/library/j-jtp03304/ # Initialization safety you can make your HashMap a final field and after the constructor finishes it would be safely published.

... Under the new memory model, there is something similar to a happens-before relationship between the write of a final field in a constructor and the initial load of a shared reference to that object in another thread. ...

3
votes

So the scenario you described is that you need to put a bunch of data into a Map, then when you're done populating it you treat it as immutable. One approach that is "safe" (meaning you're enforcing that it really is treated as immutable) is to replace the reference with Collections.unmodifiableMap(originalMap) when you're ready to make it immutable.

For an example of how badly maps can fail if used concurrently, and the suggested workaround I mentioned, check out this bug parade entry: bug_id=6423457

3
votes

This question is addressed in Brian Goetz's "Java Concurrency in Practice" book (Listing 16.8, page 350):

@ThreadSafe
public class SafeStates {
    private final Map<String, String> states;

    public SafeStates() {
        states = new HashMap<String, String>();
        states.put("alaska", "AK");
        states.put("alabama", "AL");
        ...
        states.put("wyoming", "WY");
    }

    public String getAbbreviation(String s) {
        return states.get(s);
    }
}

Since states is declared as final and its initialization is accomplished within the owner's class constructor, any thread who later reads this map is guaranteed to see it as of the time the constructor finishes, provided no other thread will try to modify the contents of the map.

1
votes

Be warned that even in single-threaded code, replacing a ConcurrentHashMap with a HashMap may not be safe. ConcurrentHashMap forbids null as a key or value. HashMap does not forbid them (don't ask).

So in the unlikely situation that your existing code might add a null to the collection during setup (presumably in a failure case of some kind), replacing the collection as described will change the functional behaviour.

That said, provided you do nothing else concurrent reads from a HashMap are safe.

[Edit: by "concurrent reads", I mean that there are not also concurrent modifications.

Other answers explain how to ensure this. One way is to make the map immutable, but it's not necessary. For example, the JSR133 memory model explicitly defines starting a thread to be a synchronised action, meaning that changes made in thread A before it starts thread B are visible in thread B.

My intent is not to contradict those more detailed answers about the Java Memory Model. This answer is intended to point out that even aside from concurrency issues, there is at least one API difference between ConcurrentHashMap and HashMap, which could scupper even a single-threaded program which replaced one with the other.]

0
votes

http://www.docjar.com/html/api/java/util/HashMap.java.html

here is the source for HashMap. As you can tell, there is absolutely no locking / mutex code there.

This means that while its okay to read from a HashMap in a multithreaded situation, I'd definitely use a ConcurrentHashMap if there were multiple writes.

Whats interesting is that both the .NET HashTable and Dictionary<K,V> have built in synchronization code.

0
votes

If the initialization and every put is synchronized you are save.

Following code is save because the classloader will take care of the synchronization:

public static final HashMap<String, String> map = new HashMap<>();
static {
  map.put("A","A");

}

Following code is save because the writing of volatile will take care of the synchronization.

class Foo {
  volatile HashMap<String, String> map;
  public void init() {
    final HashMap<String, String> tmp = new HashMap<>();
    tmp.put("A","A");
    // writing to volatile has to be after the modification of the map
    this.map = tmp;
  }
}

This will also work if the member variable is final because final is also volatile. And if the method is a constructor.