6
votes

While reading Java Concurrency in Practice book by Brian Goetz, I came across Data races and Race conditions.

Data races

A program is said to have a data race, and therefore not be a "properly synchronized" program, when there is a variable that is read by more than one thread, written by at least one thread, and the write and the reads are not ordered by a happens-before relationship.

Race condition

A race condition occurs when the correctness of a computation depends on the relative timing or interleaving of multiple threads by the runtime; in other words, when getting the correct answer relies on lucky timing. The most common type of race condition is check-then-act, where a potentially stale observation is used to make a decision on what to do next

As I understand, Data race can be avoided by making sure that one or more of the above conditions hold false - ie, by making shared variables immutable or by making the access to them properly synchronized.

My question is about the example of a SingletonFactory that is usually given to illustrate race condition.

e.g.:

public class SingletonFactory {

    private Singleton singleton = null;

    private SingletonFactory() {}

    public Singleton getInstance() {
        if(this.singleton == null) {
            this.singleton = new Singleton();
        }
        return this.singleton;
   }
}

Can this code can also be considered to cause a Data race?

I understand that one way to make the above program "completely thread safe" would be to have a double checked locking and also make the class variable volatile.

But in case I just declare the Singleton variable volatile, but fail to synchronize the code block that initializes the variable, then can it be considered as safe at-least w.r.t "Data race", but still unsafe w.r.t. race condition? In general I am still in search of a good realistic example where there is no data race, but there is still a potential race condition!

(a blog that is usually referred to explain the difference between data race and race condition does not help me to understand this)

2
The above definition for data race sucks. Because it would also consider 2 concurrent writes to the same address being a data race. Which isn't the case because they are synchronizing operations and always ordered. It should explicitly exclude synchronizing operations from the definition. - pveentjer

2 Answers

5
votes

There are three general flavors that are commonly seen with a lazy Singleton. The first is one without synchronization like your first example. The second, as you mentioned is double checked locking without volatile, and finally DCL with volatile. One contains a race-condition (if shared field is synchronized) and one contains a data-race.

public static class Singleton{
   private static volatile Singleton INSTANCE; // volatile to illustrate the race condition

   public static Singleton getInstance(){
        if(INSTANCE == null){
            INSTANCE = new Singleton();
        }
        return INSTANCE;
   }
}

In this case there doesn't happen to be a data-race but there is a race condition. The race here is that two or more threads can create a Singleton instance.

Now for a Double Checked locking example:

public static class Singleton{
   private static Singleton INSTANCE; // not volatile to illustrate the data-race

   public static Singleton getInstance(){
        if(INSTANCE == null){
            synchronized(Singleton.class){
                 INSTANCE = new Singleton();
            }
        }
        return INSTANCE;
   }
}

In this case there is a data-race but not race-condition. The writes that occur to the INSTANCE variable may not be visible to other threads despite INSTANCE not being null.

So to answer your question.

My question is whether this code can also be considered to cause a Data race?

In your example, it contains both a data-race and a race-condition since neither the shared mutable variable is synchronized nor the atomic action of check-then-set is synchronized.

0
votes

Well, no, it will not be data-safe, because two callers can still end up with two different instances for the "singleton".

Data Race is-a Race Condition, but not all Race Conditions are Data Races.

It can also happen that two parallel execution paths compete for an outcome. Consider signals - send SIGHUP and SIGTERM to one and the same process. What will happen? In what order? The behaviour will (often) be non-deterministic, even if there is no explicitly shared data between the executions.