1
votes

I've just encountered a surprising buffer overflow, while trying to use the flag MSG_TRUNC in recv on a TCP socket.

And it seems to only happen with gcc (not clang) and only when compiling with optimization.

According to this link: http://man7.org/linux/man-pages/man7/tcp.7.html

Since version 2.4, Linux supports the use of MSG_TRUNC in the flags argument of recv(2) (and recvmsg(2)). This flag causes the received bytes of data to be discarded, rather than passed back in a caller-supplied buffer. Since Linux 2.4.4, MSG_PEEK also has this effect when used in conjunction with MSG_OOB to receive out-of-band data.

Does this mean that a supplied buffer will not be written to? I expected so, but was surprised. If you pass a buffer (non-zero pointer) and size bigger than the buffer size, it results in buffer overflow when client sends something bigger than buffer. It doesn't actually seem to write the message to the buffer if the message is small and fits in the buffer (no overflow). Apparently if you pass a null pointer the problem goes away.

Client is a simple netcat sending a message bigger than 4 characters.

Server code is based on: http://www.linuxhowtos.org/data/6/server.c

Changed read to recv with MSG_TRUNC, and buffer size to 4 (bzero to 4 as well).

Compiled on Ubuntu 14.04. These compilations work fine (no warnings):

gcc -o server.x server.c

clang -o server.x server.c

clang -O2 server.x server.c

This is the buggy (?) compilation, it also gives a warning hinting about the problem:

gcc -O2 -o server.x server.c

Anyway like I mentioned changing the pointer to null fixes the problem, but is this a known issue? Or did I miss something in the man page?

UPDATE:

The buffer overflow happens also with gcc -O1. Here is the compilation warning:

In function ‘recv’, inlined from ‘main’ at server.c:47:14: /usr/include/x86_64-linux-gnu/bits/socket2.h:42:2: warning: call to ‘__recv_chk_warn’ declared with attribute warning: recv called with bigger length than size of destination buffer [enabled by default] return __recv_chk_warn (__fd, __buf, __n, __bos0 (__buf), __flags);

Here is the buffer overflow:

./server.x 10003 * buffer overflow detected *: ./server.x terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x7338f)[0x7fcbdc44b38f] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7fcbdc4e2c9c] /lib/x86_64-linux-gnu/libc.so.6(+0x109b60)[0x7fcbdc4e1b60] /lib/x86_64-linux-gnu/libc.so.6(+0x10a023)[0x7fcbdc4e2023] ./server.x[0x400a6c] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fcbdc3f9ec5] ./server.x[0x400879] ======= Memory map: ======== 00400000-00401000 r-xp 00000000 08:01 17732 > /tmp/server.x ... more messages here Aborted (core dumped)

And gcc version:

gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4

The buffer and recv call:

char buffer[4];

n = recv(newsockfd,buffer,255,MSG_TRUNC);

And this seems to fix it:

n = recv(newsockfd,NULL,255,MSG_TRUNC);

This will not generate any warnings or errors:

gcc -Wall -Wextra -pedantic -o server.x server.c

And here is the complete code:

/* A simple server in the internet domain using TCP
   The port number is passed as an argument */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h> 
#include <sys/socket.h>
#include <netinet/in.h>

void error(const char *msg)
{
    perror(msg);
    exit(1);
}

int main(int argc, char *argv[])
{
     int sockfd, newsockfd, portno;
     socklen_t clilen;
     char buffer[4];
     struct sockaddr_in serv_addr, cli_addr;
     int n;
     if (argc < 2) {
         fprintf(stderr,"ERROR, no port provided\n");
         exit(1);
     }
     sockfd = socket(AF_INET, SOCK_STREAM, 0);
     if (sockfd < 0) 
        error("ERROR opening socket");
     bzero((char *) &serv_addr, sizeof(serv_addr));
     portno = atoi(argv[1]);
     serv_addr.sin_family = AF_INET;
     serv_addr.sin_addr.s_addr = INADDR_ANY;
     serv_addr.sin_port = htons(portno);
     if (bind(sockfd, (struct sockaddr *) &serv_addr,
              sizeof(serv_addr)) < 0) 
              error("ERROR on binding");
     listen(sockfd,5);
     clilen = sizeof(cli_addr);
     newsockfd = accept(sockfd, 
                 (struct sockaddr *) &cli_addr, 
                 &clilen);
     if (newsockfd < 0) 
          error("ERROR on accept");
     bzero(buffer,4);
     n = recv(newsockfd,buffer,255,MSG_TRUNC);
     if (n < 0) error("ERROR reading from socket");
     printf("Here is the message: %s\n",buffer);
     n = write(newsockfd,"I got your message",18);
     if (n < 0) error("ERROR writing to socket");
     close(newsockfd);
     close(sockfd);
     return 0; 
}

UPDATE: Happens also on Ubuntu 16.04, with gcc version:

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

1
What warnings do you get? Have you tried enabling even more warnings (with e.g. -Wall -Wextra -pedantic or similar)? And please show your actual recv call together with the definition of the buffer.Some programmer dude
Also, since it obviously works, it's probably not a problem with the kernel, but with the compiler, so can you please tell us which version of GCC you are using? And have you tried with later versions of GCC? Earlier? Tried disabling some specific optimizations? Tested with -O1? Tested with -O1 and then enabling one specific optimization option after another until you get the problem (so you know which one is the cause)?Some programmer dude
The problem is unlikely to be with the compiler. It may be with the C library (which is perhaps too fine a distinction), but it is more likely with the program. To meaningfully address the question, we need code with which we can reproduce the problem. If you want us to actually look at such code, then present it in the form of a minimal reproducible example.John Bollinger
And also tell us how you determined that there was a buffer overflow.kaylum

1 Answers

3
votes

I think you have misunderstood.

With datagram sockets, MSG_TRUNC option behaves as described in man 2 recv man page (at Linux man pages online for most accurate and up to date information).

With TCP sockets, the explanation in the man 7 tcp man page is a bit poorly worded. I believed it is not a discard flag, but a truncate (or "throw away the rest") operation. However, the implementation (in particular, net/ipv4/tcp.c:tcp_recvmsg() function in the Linux kernel handles the details for TCP/IPv4 and TCP/IPv6 sockets) indicates otherwise.

There is also a separate MSG_TRUNC socket flag. These are stored in the error queue associated with the socket, and can be read using recvmsg(socketfd, &msg, MSG_ERRQUEUE). It indicates a datagram that was read was longer than the buffer, so some of it was lost (truncated). This is rarely used, because it is really only relevant to datagram sockets, and there are much easier ways to determine overlength datagrams.


Datagram sockets:

With datagram sockets, the messages are separate, and not merged. When read, the unread part of each received datagram is discarded.

If you use

    nbytes = recv(socketfd, buffer, buffersize, MSG_TRUNC);

it means that the kernel will copy up to first buffersize bytes of the next datagram, and discard the rest of the datagram if it is longer (as usual), but nbytes will reflect the true length of the datagram.

In other words, with MSG_TRUNC, nbytes may exceed buffersize, even though only up to buffersize bytes are copied to buffer.


TCP sockets in Linux, kernels 2.4 and later, edited:

A TCP connection is stream-like; there are no "messages" or "message boundaries", just a sequence of bytes flowing. (Although, there can be out-of-band data, but that is not pertinent here).

If you use

    nbytes = recv(socketfd, buffer, buffersize, MSG_TRUNC);

the kernel will discard up to next buffersize bytes, whatever is already buffered (but will block until at least one byte is buffered, unless the socket is in non-blocking mode or MSG_TRUNC | MSG_DONTWAIT is used instead). The number of bytes discarded is returned in nbytes.

However, both buffer and buffersize should be valid, because a recv() or recvfrom() call goes through the kernel net/socket.c:sys_recvfrom() function, which verifies buffer and buffersize are valid, and if so, populates the internal iterator structure to match, before calling the aforementioned net/ipv4/tcp.c:tcp_recvmsg().

In other words, the recv() with a MSG_TRUNC flag does not actually try to modify buffer. However, the kernel does check if buffer and buffersize are valid, and if not, will cause the recv() syscall to fail with -EFAULT.

When buffer overflow checks are enabled, GCC and glibc recv() does not just return -1 with errno==EFAULT; it instead halts the program, producing the shown backtraces. Some of these checks include mapping the zero page (where the target of a NULL pointer resides in Linux on x86 and x86-64), in which case the access check done by the kernel (before actually trying to read or write to it) succeeds.

To avoid the GCC/glibc wrappers (so that code compiled with e.g. gcc and clang should behave the same), one can use real_recv() instead,

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>

ssize_t real_recv(int fd, void *buf, size_t n, int flags)
{
    long retval = syscall(SYS_recvfrom, fd, buf, n, flags, NULL, NULL);
    if (retval < 0) {
        errno = -retval;
        return -1;
    } else
        return (ssize_t)retval;
}

which calls the syscall directly. Note that this does not include the pthreads cancellation logic; use this only in single-threaded test programs.


In summary, with the stated problem regarding MSG_TRUNC flag for recv() when using TCP sockets, there are several factors complicating the full picture:

  • recv(sockfd, data, size, flags) actually calls the recvfrom(sockfd, data, size, flags, NULL, NULL) syscall (there is no recv syscall in Linux)

  • With a TCP socket, recv(sockfd, data, size, MSG_TRUNC) acts as if it were to read up to size bytes into data, if (char *)data+0 to (char *)data+size-1 are valid; it just does not copy them into data. The number of bytes thus skipped is returned.

  • The kernel verifies data (from (char *)data+0 to (char *)data+size-1, inclusive) is readable, first. (I suspect this check is erroneous, and might be turned into a writability check sometime in the future, so do not rely on this being a readability test.)

  • Buffer overflow checks can detect the -EFAULT result from the kernel, and instead halts the program with some kind of "out of bounds" error message (with a stack trace)

  • Buffer overflow checks may make NULL pointer seem like valid from the kernel point of view (because the kernel test is for reading, currently), in which case the kernel verification accepts the NULL pointer as valid. (One can verify if this is the case by recompiling without buffer overflow checks, using e.g. the above real_recv(), and seeing if a NULL pointer causes an -EFAULT result then.)

    The reason for such a mapping (that, if allowed by hardware and the kernel structures, only exists, and is not readable or writable) is that with such a mapping, any access generates a SIGBUS signal, which a (library or compiler-provided signal handler) can catch, and dump not just a stack trace, but more details about the exact access (address, code that attempted the access, and so on).

    I do believe the kernel access check treats such mappings readable and writable, because there needs to be a read or write attempt for the signal to be generated.

  • Buffer overflow checks are done by both the compiler and the C library, so different compilers may implement the checks, and the NULL pointer case, differently.