0
votes

I am wondering why it would be incorrect to implement this sort of queue the naive way:

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <sys/time.h>


void *print_message_function( void *ptr );

void *reader( void *ptr );
void *writer( void *ptr );



int queue[500000];


int main(int argc, char **argv) 
{
   pthread_t thread1, thread2;

   char *message1 = "Thread 1";
   char *message2 = "Thread 2";
   int  iret1, iret2; 


   iret1 = pthread_create( &thread1, NULL, writer, (void*) message1);
   iret2 = pthread_create( &thread2, NULL, reader, (void*) message2);

   usleep(2000);

   void pthread_exit(void *iret1 );
   void pthread_exit(void *iret2 );

   exit(0);

}



void *writer( void *ptr )
{
  // make local copy of queue head
  register int *pos = queue; 

  //   struct thread_param *tp = arg;
  int counter = 0;

  while(1)
  {
    //Write to head of queue
    *pos = 5;

    pos++;

    print_message_function(  ptr);
  }
}


void *reader( void *ptr )
{
  int counter = 0;

  // make local copy of queue head
  register int *pos = queue; 

  while(1)
  {

    // Read from tail of queue - loop when nothing
    if ( *pos == 5 ) 
    { 
      print_message_function( ptr ); 
      pos++; 
    }
  }
}



void *print_message_function( void *ptr )
{
      char *message;
      message = (char *) ptr;
      printf("%s \n", message);
}

I do intend to cache align the queue.

I do not believe memory reordering is a problem since a copy of the queue head is made on start and there is a single reader and writer.

My reason for wanting this is it should be faster than either mutex locks or CAS ops.

2
Is this intended to be portable C code or is it platform-specific? If so, what platform is it supposed to work on?David Schwartz
Its platform-independent linux, I can code custom assembly if need be.BAR
That's kind of an oxymoron. Is it Linux-specific? Or is platform-independent?David Schwartz
Sorry.. its platform-specific.BAR
Get your system working to a functional level with coventional blocking queues. If you have a performance issue, you MAY find that you have to look at some different queue class. You're only going to be queueing pointers, yes? You have to try quite hard to get contention when the queue is only locked for push/pop one pointer time. Overall, your app performance will probably be blown by the CPU-loops, as others have warned. Likewise CPU-affinity bodges. Would your time not be better spent on 'real' app code?Martin James

2 Answers

2
votes

with POSIX threads you only have data coherence between threads if you use mutexes, locks etc. And the coherence has no well defined interface with your compiler. (and volatile definitively isn't it) Don't do it like that, all things can happen, as updates of variables that are optimized out (here volatile could help) or partial reads or writes.

C11, the new C standard has a threading model that includes a data coherence model, thread creation functions and atomic operations. There is no compiler that implements this completely, it seems, but gcc or clang on top of POSIX threads implement the feature that you need. If you'd like to try this out and be future proof, P99 implements wrappers for these platforms that allow you to use the new C11 interfaces.

C11's _Atomic types and operations would be correct tools to implement lock free queues that operate between threads.

1
votes

In C, the volatile keyword has no defined semantics that apply when a variable is accessed concurrently in multiple threads (and pthreads doesn't add any). So the only way to know if it's safe or not is to look at the effects volatile has on particular platforms and compilers, figure out every possible way it could go wrong on those specific hardware platforms, and rule them out.

It's a really bad idea to do this if you have a choice. Portable code tends to be much more reliable. The two big problems are:

  1. New platforms do come out. And fragile code can break when a new CPU, compiler, or library is released.

  2. It's very hard to think of every way this could go wrong because you don't really know what you're working with. Mutexes, atomic operations, and the like have precisely-defined semantics for multiple threads, so you know exactly what guarantees you have -- on any platform, with any compiler, with any hardware.

Your reader code is terrible by the way. On hyper-threaded CPUs, for example, tightly spinning like that will starve the other virtual core. Worse, you could wind up spinning at FSB speed, starving other physical cores. And when you exit the spin loop -- the time performance is most critical -- you basically force a mispredicted branch! (The exact effects depends on the specifics of the CPU, which is another reason using this kind of code is bad. You need at least a rep nop.)