3
votes

Let's assume this:

  1. There is single ArrayList
  2. List is accessed by multiple threads. Thread can add element and iterate over all elements.
  3. All access is externally synchronized. So it's not possible that two threads access list at the same time.

Looking at ArrayList source code we can see that size and elementData fields aren't volatile:

    transient Object[] elementData; // non-private to simplify nested class access

    private int size;

Also, let's look at add method:

    /**
     * This helper method split out from add(E) to keep method
     * bytecode size under 35 (the -XX:MaxInlineSize default value),
     * which helps when add(E) is called in a C1-compiled loop.
     */
    private void add(E e, Object[] elementData, int s) {
        if (s == elementData.length)
            elementData = grow();
        elementData[s] = e;
        size = s + 1;
    }

    /**
     * Appends the specified element to the end of this list.
     *
     * @param e element to be appended to this list
     * @return {@code true} (as specified by {@link Collection#add})
     */
    public boolean add(E e) {
        modCount++;
        add(e, elementData, size);
        return true;
    }

Does something like that can happen?

  1. Let's assume that list has 4 elements.
  2. Thread A adds new element. Size is updated to 5.
  3. Thread B adds new element. Size is cached and thread B sees its old value (4). So instead of adding new element, last element is overwritten.

Could similar situation happen with elementData?

2
Without synchronization, yes. That's why external synchronization is required.Andy Turner
@AndyTurner he is synchronizing the list externally. The question is about caching within the JVM, not synchronicity.William Rosenbloom
@WilliamRosenbloom: All actions done by a thread while it holds a lock are guaranteed to be visible to any other threads that later acquire the lock. Any caches that need to be flushed to ensure that, will be.supercat
@WilliamRosenbloom JLS 17.4.5. Exiting a synchronized block in one thread happens before threads later entering a block on that same monitor (because they synchronize with each other). Anything that happens before the unlock in the first thread to execute therefore happens before anything in the other thread after the lock. As such, anything done within the first block is visible to the second.HTNW
@WilliamRosenbloom: One of the main purposes of synchronization primitives is to provide a means of strongly ordering accesses to "ordinary" objects on different threads; such guarantees would be useless if they didn't do whatever had to be done with caches to ensure such behavior.supercat

2 Answers

5
votes

TL;DR: the problems you describe are impossible, with correct synchronization, because synchronization ensures atomicity and visibility of the operations.


The way a JVM executes Java code is rather complicated. It is free to reorder the instructions corresponding to the expressions and statements in the Java code, in order to execute them more efficiently, provided that you can't tell that a thread has reordered its operations.

In essence, it's like a boss who says "I don't care how you get the work done, just have [the work] done by [some time]".

The difficulty of this is that whilst it says that you mustn't be able to see the reordering within a thread, it doesn't say that different threads can't see each other doing things in a different order.

This is head-spinning stuff. The simplifying concept is the idea of happens-before. There are certain things you can do in the two threads to ensure that things done by one thread appear to have happened already by the time the other thread tries to use the result of them. Literally, things in one thread have "happened before" things in the other. (Continuing with the work analogy, this is like having to hand over your completed work to a colleague for them to be able to do theirs: they can take what you've completed and do their work, irrespective of how you completed it).

There are a number of well-known things which create happens-before relationships. The relevant ones in terms of this question are:

  • A write of a volatile variable happens before a read of the same variable. This is often described in terms of "the data is always written to and read from main memory, instead of being cached by the thread".
  • Exiting a synchronized block happens before entering a synchronized block with the same monitor. In other words, writes which happen inside a synchronized block by one thread are visible to other threads executing code synchronized in the same thing

So, volatile and synchronized are both ways of creating the happens-before that is required to guarantee that [something] done by one thread is seen by the other.

But there is a difference between the two:

  • Volatile gives you visibility: it ensures that a write is visible.
  • Synchronization gives you visibility and atomicity: it ensures that the write is visible, but it additionally ensures that nobody else is doing something at the same time, for as long as a particular monitor is held.

In the case of adding to an ArrayList, atomicity is required, because you are doing more than one thing: increasing the size and assigning the new array element.

Making the variables volatile as well would serve no purpose in terms of the correctness, but it would make the code slower in the modal case, which is where ArrayList is only ever accessed from a single thread.

So, provided your code is correctly synchronized - that is, all accesses to the list are synchronized on the same thing, for example on the list itself - the situation you describe cannot happen, because of the atomicity and visibility properties of synchronization.

-3
votes

I added another answer earlier based on a misunderstanding of your question. Sorry about that.

Regarding the volatility of the size field in an ArrayList, your fears are justified. The error you describe is possible. Here is an article describing exactly the same issue.

In case of volatile reference object, it is ensured that the reference itself will be visible to other threads in a timely manner but the same is not true for its member variables. There is no guarantee that data contained within the object will be visible consistently if accessed individually.

There's nothing you can do about the standard ArrayList from java.util, but you could solve this problem pretty easily by making your own class that closely mirrors ArrayList, but sets the volatility of instance variables properly. This could also give you a simpler (or more easily readable) way to synchronize the list, rather than synchronizing it externally.