12
votes

I wrote this small server application in pure C that listens to incoming connections in a given port, very simple stuff.

It goes with the usual socket initialization procedure, create the socket() then bind() to the port, says its a listen(), and ifinitely loops through a select() waiting for incoming connections to accept().

All goes just fine and works like a charm, except that if I leave the thing running for a couple months, the listening port closes while the application server keeps running unaware of it, since I wrote it to trust the listening socket will not close if not told to.

So the question is: Why the hell is the port being closed without my application's concern and what can I do to prevent it from happening?

Is that expected behaviour? Should I check for some kind of exceptions or make "health check" on the listening socket to reopen it if necessary?

Code: https://gist.github.com/Havenard/e930be035a3bee75c018 (yes I realize I'm using 0 as cue for errors and it's bad pratice and stuff, but it is not relevant to the question as I explained in the comments, when I set the socket file descriptor to 0 it is to stop the loop and shut down the application).

5
It's not normal. I would suspect some sort of resource leak in your application, such as running out of file descriptors because you don't close the new socket returned by accept().rhashimoto
Select() would result in an invalid filedescriptor being reported/error'd. Maybe you closed it by dup2()ing over it ?wildplasser
This is not normal. Show your code.Nikolai Fetissov
A few (stylistic) hints on the sourcecode: 1) the main() function is way too large to be readable 2) The signalhandler is defined inside main(). C does not have nested functions. 3) you test your file descriptors against zero ( 'if (skt_accept == 0)... ) try to use -1 as invalid filedescriptor (and set them to -1 when invalid) 4) in your gobackground() function you do a printf after closing filedesctiptors 0,1,2. 5) you really should ignore some of the -1 returns from select(), such AS EAGAIN. 6) The linked lists are messy and overly complicated, IMHO. 7) ' ' is more readable than 0x20wildplasser
@Havenard Right i am not taklink about send on listening socket but send on accepted socket. One of your 30 clients might disconnect abruptly from your irc/like server while you are still talking to it. when it happens if your ever try to send() something to client socket whole program will quit with SIGPIPE.philippe lhardy

5 Answers

6
votes

I would start by cleaning it up:

  • cut it up into smaller, readable, verifyable , testable functions
  • the linked lists usage look messy; it could be simplified a lot, maybe by introducing some generic functions.
  • replace all the silly '\x20' character constants by the more readable ' ' equivalents
  • avoid manifest magic constants like here if (n_case > 0) memcpy(nick, node->nick, (n_case > 32 ? 32 : n_case)); ; sizeof is your friend.
  • don't use zero as a sentinel value for an unused file descriptor; use -1 instead.
  • use unsigned types for sizes and indexes; negative indexes will corrupt memory, fold-over unsigned types will fail fast. (failfast is your friend)

That's only a few hours of editing.

My guess is that, after cleanup/refactoring your "bug" will come to the surface magically.

Footnote: No,I won't do your work for you. Not for 100 points, not for 1000. Please clean up your own mess.

2
votes

This answer is mostly a code review of places where you call close().

Line 330: You close the socket but don't continue right away like in other places in your code. This could lead to weird behavior.

Line 928: In most places, you set the client or server socket to 0 after a call to close(). You don't after this call.

Line 1193: Same comment as line 928.

Line 1195: Same comment as line 928.

Line 1218: Same comment as line 928.

Line 1234: Same comment as line 928.

Line 1236: Same comment as line 928.

When I compiled the code with full warnings, I saw a number of places where the compiler noted functions declared to return a value, but no value is being returned.

x.c:582: warning: no return statement in function returning non-void
x.c:591: warning: no return statement in function returning non-void
x.c:598: warning: no return statement in function returning non-void
x.c:609: warning: no return statement in function returning non-void
x.c:620: warning: no return statement in function returning non-void
x.c:728: warning: no return statement in function returning non-void
x.c:779: warning: no return statement in function returning non-void

There are many other problems, as noted in other posts.

As far as debugging this issue, If I suspected the binding socket was being closed early, I would intercept the close() call with my own version that asserts that the descriptor being closed should not match the binding socket.

However, as wildplasser noted, select() would return an error about the invalid descriptor if it was closed.

1
votes

The bug is that you are using 0 as invalid file descriptor. 0 is perfectly valid and is usually stdin. Then the listener is set to 0 in the signal handler. Then you use 0 as no fd, and at some point you do close(0) on some socket, there are branches that do close(fd) without checkoing it for 0 and that effectively closes the listener. The other possible option to stop the listener from working is to overflow the backlog.

And one more problem - using unsigned int for fds. system calls return -1 on error ... and that error would not be detected with if assigned to unsigned int struct identd_node -> unsigned int handle; struct thread_node -> unsigned int skt_clnt, skt_serv;

0
votes

It looks like your code needs to have 2 consecutive errors to cause a failure.

If you get an error from select, why not print out why straight away?

On line 281, printf errno/perror to find out what the problem is?

0
votes

Though the system should not behave as described, it sometimes does that. For server systems usually you need to perform a healthcheck call, either externally (from script), or from a special thread in your code.

So, if you detect that you can't connect to server in few consecutive attempts (few is needed due to possible overload condition), you can consider socket broken and recreate it or restart the server.