Herb Sutter Atomic Weapons "Why Standalone Fences are Suboptimal"

Question

Towards the end of his talk about memory barriers (fences), he gave the following example (Note: global is not of atomic type):

// thread 1                                     // thread 2
widget *temp = new widget();
global = temp;
                                                global->do_something();
                                                global->do_something_else();

Later he said we should have full fences as the follows:

   // thread 1                                     // thread 2
   widget *temp = new widget();
XX mb();   XXXXXXXXXXXXXXXXXXXXX
   global = temp;
                                                   temp2 = global;
                                                XX mb(); XXXXXXXXXXXXXXXX  
                                                   temp2->do_something();
                                                   temp2 = global;
                                                XX mb(); XXXXXXXXXXXXXXXX
                                                   temp2->do_something_else();

I wonder why in thread 1 you need a barrier? global depends on temp and compiler wouldn't move global = temp; above the construction of temp anyway. One reason I can think of that needs the memory barrier is that the statement global=temp; somehow can be done in the middle of the construction of new widget() and thread 2 then would see a partially constructed global. Is that possible that assignment to global be scheduled in the middle of constructing new widget? Or the memory barrier is due to some other reasons?

Also in thread 2, don't you need another barrier after temp2->do_something();? As in the currently transformed form, the following statements can still be reordered during execution:

                                                   temp2 = global;
                                                XX mb(); XXXXXXXXXXXXXXXX 

                                                   temp2 = global;
                                                   temp2->do_something();

                                                XX mb(); XXXXXXXXXXXXXXXX
                                                   temp2->do_something_else();

And this is not what you intended since now you only invoke member functions on one temp2 instead of reading a new value from global before executing do_something_else().

If no such reordering is possible in thread 2, then why do we need the barriers in the first place in thread 2 if temp2->do_something(); can't be reordered before temp2 = global;.

SergeyA SergeyA · Accepted Answer · 2016-04-05T18:39:48

I wonder why in thread 1 you need a barrier? global depends on temp and compiler wouldn't move global = temp; above the construction of temp anyway.

It has to do with initialization of the inner members of temp. Due to CPU caching and reordering, without memory barrier here global might have the address of new temp, but memory pointed to by temp might appear uninitialized.

As for the second question, obviosuly

temp2->do_something();
temp2 = global;

can not be reordered as

temp2 = global;
temp2->do_something();

as it will do something on the very different object!

Herb Sutter Atomic Weapons "Why Standalone Fences are Suboptimal"

1 Answers