5
votes

$ uname -a

Linux crowsnest 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 UTC 2011 x86_64 GNU/Linux

$ man readdir:

DESCRIPTION

The readdir() function returns a pointer to a dirent structure representing the next directory entry in the directory stream pointed to by dirp...

..[snip]...

The readdir_r() function is a reentrant version of readdir()...

...[snip]...

RETURN VALUE

On success, readdir() returns a pointer to a dirent structure. (This structure may be statically allocated; do not attempt to free(3) it.) If the end of the directory stream is reached, NULL is returned and errno is not changed. If an error occurs, NULL is returned and errno is set appropriately.

The readdir_r() function returns 0 on success. On error, it returns a positive error number. If the end of the directory stream is reached, readdir_r() returns 0, and returns NULL in *result.

I'm confused about what this means, my application of this function is to collect a dynamically allocated array of pointers to structs with data about the directory entries, and I'm wondering if I can dynamically allocate dirent structs and set the pointers to them. but this line seams to say that the result should never be called by free, so I'm wondering if I should allocate a seperate dirent struct which will be part of the list and memcpy it over the returned result.

I'm also confused by the terminology of "may" in the above man page. does this mean that somtimes it's statically allocated, and sometimes it's not.

I'm familiar, (vaguely) with what static variables mean in C, but not sure about all the rules and possible gotcha's arround them. because I want to pass the dirent structs that are in a directory around, I would rather it be dynamically allocated. is this what readdir_r is for? or will the double pointer be set to point to another statically allocated dirent struct?

and I'm not entirely sure what reentrant means in this context for readdir_r. my understanding of renetrant is only from scheme coroutines which I'm not sure how that would apply to reading unix directories.

3
It should be noted that the documentation claiming readdir_r is reentrant is simply wrong. The author of the man page does not understand the difference between reentrant and thread-safe. readdir_r certainly uses buffered reading through the DIR struct (requesting one entry at a time from the kernel would be very slow) and thus has to hold a lock on this structure, which makes it non-reentrant.R.. GitHub STOP HELPING ICE
@R..: but as Posix says over and over, "a function that is not required to be reentrant is not required to be thread-safe". Therefore, a function that is required to be thread-safe is required to be reentrant, by whatever definition Posix uses for reentrant. Other people may use other definitions of reentrant: I guess that since Posix forbids any code that might re-enter readdir_r in a single thread, and it does require thread-safety, it considers it trivially reentrant in some sense.Steve Jessop
@Steve: This bug was fixed in POSIX 2008 which no longer has the nonsensical wording. Instead, it now reads simply: "The readdir() function need not be thread-safe."R.. GitHub STOP HELPING ICE
@R.. ah, well, if Posix changed the definition of reentrant in 2008, it's not necessarily surprising if a man page hasn't caught up.Steve Jessop
POSIX 2008 didn't really change the definition; it simply removed the erroneous use of the word "reentrant".R.. GitHub STOP HELPING ICE

3 Answers

7
votes

The structure might be statically-allocated, it might be thread-local, it might be dynamically allocated. That's up to the implementation. But no matter what, it's not yours to free, which is why you must not free it.

readdir_r doesn't allocate anything for you, you give it a dirent, allocated however you like, and it fills it in. Therefore it does save you a little bit of effort compared with calling readdir and copying the dir data. That's not the main purpose of readdir_r, though, what it's actually for is the ability to make calls from different threads at the same time, which you can't do with readdir.

What "reentrant" actually means, is that the function can be called again before a previous call to it has returned. In general, this might mean from a different thread (which is what most people mean by "thread-safe"), from a handler for a signal that occurred during the first call, or due to recursion. But the C standard has no concept of threads, so it mentions "reentrant" meaning only the latter two. Posix defines "thread-safe" to require this form of reentrancy and, in addition, the thing that most people mean by thread-safe.

In Posix, every function required to be thread-safe is required to be reentrant, and readdir_r is required to be thread-safe. I think reentrancy in the weaker sense is irrelevant to readdir_r, since it doesn't call any user code that could result in recursion, and it's not async-signal-safe so it must not be called from a signal handler either.

Beware, because when some people (Java programmers) say "thread-safe", they mean that the function can be called by different threads on the same arguments at the same time, and will use locks to work correctly. Posix APIs do not mean this by thread-safe, they only mean that the function can be called on different data at the same time. Any global data that the function uses is protected by locks or otherwise, but the arguments need not be.

6
votes

The rule here is really simple -- you're free to make a copy of the data readdir() returns, however you don't own the buffer it puts that data in so you cannot take actions that suggest you do. (I.e., copy the data out to your own buffer; don't store a pointer to within the readdir-owned buffer.)

so I'm wondering if I should allocate a seperate dirent struct which will be part of the list and memcpy it over the returned result - that's exactly what you should do.

I'm also confused by the terminology of "may" in the above man page. does this mean that somtimes it's statically allocated, and sometimes it's not. - it means you cannot count on how it will be managed, but it will be managed for you. The details could vary from one system to the next.

Reentrant means thread-safe. readdir() uses a static entry, making it not safe for multiple threads to use as if they each control the multi-call process. readdir_r() will use allocated space provided by the caller, letting multiple threads act independently.

6
votes

First question

It means readdir could have something like this:

struct dirent *
readdir(DIR *dirp)
{
    static struct dirent;
    /* Do stuff. */

    return &dirent;
}

Clearly it would be illegal to free it (since you didn't obtain it via malloc).

The standard doesn't force anyone to do it like this. An implementation could use its own mechanism (perhaps malloc and free later on its own).

Second question

"Reentrant" means that while we are inside readdir_r, the function can be safely called again (for example from a signal handler). For instance, readdir isn't reentrant. Suppose this happens:

  • You call readdir(dir); and it starts modifying dirent
  • BEFORE it is done, it is interrupted and someone else calls it (from an async context)
  • Its version modifies dirent, returns and the async context goes on its way
  • Your version returns. What does dirent contain ?

Reentrant functions are a godsend, they are always safe to call.