2
votes

I am experiencing some unexplained behavior with an app that is writing UDP data with sendto() to multiple ports (all opened with socket(PF_INET, SOCK_DGRAM, 0)) for the benefit of a set of client reading processes. These sendto()s occasionally and unpredictably trigger ECONNREFUSED errors. This is happening on a macOS Sierra (10.12) system whose sendto(2) manual page does not even list ECONNREFUSED as a possible error. Interestingly, I have a CentOS7 system (where these errors never occur) whose sendto(2) manual page references additional sendto() errors documented on the udp(7) manual page, and that CentOS7 udp(7) page says:

ECONNREFUSED

 No receiver was associated with the destination address. This
 might be caused by a previous packet sent over the socket.

(ECONNREFUSED is not mentioned anywhere on the macOS Sierra udp(4) page.) I have no idea if the CentOS7 manual pages have any relevance to macOS, but assuming for a moment that they do, the above explanation of ECONNREFUSED with respect to sendto() is confusing on a couple of points:

First, everything I have ever heard about UDP emphasizes its connectionless nature. So, why would a sendto() fail because no receiver is connected (or 'associated', as the man page says, which I take to mean the same thing)? Isn't the whole point of UDP that if you are a talker, you just jabber away and don't care if anyone else is listening or not? These CentOS7 udp(7) comments do seem to apply to my Sierra system, however, because when I have client processes running which are bound to and reading from these ports I never have problems, but if I start the UDP writer before the readers are running I will often (but not always) see these errors.

Second, can anyone explain to me why, according to the CentOS7 udp(7) documentation, a previous packet sent over the socket could cause no receiver to be associated with the destination address? This makes no sense at all to me. Are some datagrams just so toxic that they kill whoever reads them?

I should also note that, besides not ever seeing this problem on CentOS7 where it's actually (if vaguely) documented, I have also never experienced it on any MacOS release prior to Sierra, and this code has been running well for me for there many years. I still have one El Capitan system and cannot duplicate the errors there.

Following is more information about my app -- please feel free to comment either on the above general questions about PF_INET UDP, sendto() and ECONNREFUSED or on the more specific details of my app as noted below. I have a usable workaround already (see below) but would like to better understand what is going on.

My app is reading data from various sources (serial lines and/or UDP ports), massaging it into reformatted output messages of various types and then writing those messages to multiple pre-defined consecutively numbered (e.g., 3000 through 3004) UDP ports at the same IP address to be read by a small and variable number of clients (limited to 5 but usually no more than 3 or 4). Each client scans the pre-defined list of my app's UDP output ports, binds to the first available port, and then does all its reading from that port. There is no guarantee in advance of the order in which my writer app and the multiple reader processes will be started (a central part of my problem here). My app is writing messages about once per second to each output port which are usually no more than about 80 bytes each (all ASCII text).

These reader clients might be running on (i) the same local host as my app, (ii) a single remote host, or (iii) different remote hosts on the local network, so my writer app accepts an arbitrary IPv4 destination address as a command argument. Assuming my writer is running on host 192.168.1.LLL (the Local host), the most commonly used destination addresses will be:

  • 127.0.0.1
  • 192.168.1.LLL (the actual external address of localhost)
  • 192.168.1.RRR (some Remote host on the same LAN)
  • 192.168.1.255 (local broadcast for readers on multiple remote hosts)

Note that I see these errors ONLY when sending output to either 127.0.0.1 or to 192.168.1.LLL, the actual external address of the localhost. The errors never occur when I write to either a specific remote host 192.168.1.RRR or to the LAN's broadcast address 192.168.1.255. Is there supposed to be a difference between what happens with local PF_INET vs. remote PF_INET UDP writes? Maybe local writes have to be handled in a specific way within some local buffer which is subject to various constraints, whereas packets sent off-host are just scattered to the winds and whatever happens is considered beyond the reporting capacity of the local sendto()? Although I never see these errors when using the broadcast address 192.168.1.255 I prefer not to use that out of network politeness unless I know that my clients really are running on multiple remote hosts -- if everything is on one system I'd rather keep things private by using either of the strictly local addresses 127.0.0.1 or 192.168.1.LLL (which are the addresses that can lead to errors).

For now I am working around this problem by just ignoring all ECONNREFUSED sendto() errors. It seems that I tend to get them within a few seconds of starting my app, although never on the first sendto() on each port and usually on only one of my 5 output ports (although the port generating the error is not always the same). And, after the initial errors, the next few minutes' worth of output (the longest I've ever watched) is error-free even though there are still no readers running. These errors are mystifying, however, and I would like to have a better understanding of them to make my code as robust as possible. I am not including my actual code in this post as the latter is already overly long and there's nothing unusual about the code as far as I can tell, but I can post it separately if that would be useful.

Thanks!

Roger Davis, Univ. of Hawaii

1

1 Answers

3
votes

Whilst at the UDP layer you can jabber away to any IP. RFC1122, section 4.1.3.3, indicates that any errors at the IP layer (that cause ICMP errors to occur) must propogate the error back up to the application layer. As you can see in RFC792 page 3, a code 3 message is Port Unreachable.

Hence the inability for an IP packet to be sent to a port of 127.0.0.1 will cause an icmp error manifested as ECONNREFUSED at the application layer. It's reported async (since icmp has timeouts for replies) and you may have send another udp packet by then.

Why does it happen more on local connections? The packet doesn't actually ever leave the kernel hence it can reply to the ICMP error before the next udp packet is sent. On the other addresses, it actually has to be put on the wire. So you can still get the errors but they will be less frequent depending on your UDP send rate. Also if your sending through a gateway the gateway might just drop the udp packet. If there's a firewall between your host and the remote host, it might also drop the icmp reply, or limit the return rate of the reply.

Addressing the error, if you do get ECONNREFUSED, you know there's either no host with that IP or nothing listening on that port. Either way, it's pointless still sending.