RFC-recommended representation of IPv6 addresses and inet_ntop in C

Question

Reading the RFC recommendations (RFC 5952) for how to represent IPv6 addresses in the best way, I try to implement a function in C++ that converts an array of bytes into the appropriate textual representation, i.e. std::string.

To check my code for correctness, I compare my results with what inet_ntop (#include <arpa/inet.h>) returns. Note that I am actually using the Windows-equivalent #include <ws2tcpip.h>.

For most cases, my function has the same behavior and I fully understand the underlying rules (leave out leading zeros, compress the longest block of zeros by replacnig it with "::", and so on).

But the interesting part is the following: As far as I understand, for some special addresses (IPv4-mapped, IPv4-compatible, and IPv4-translated IPv6 addresses, see RFC 2765), it is recommended to represent the otherwise hexa-decimal notation of the last 4 bytes with the dotted decimal notation that is common for IPv4 addresses.

For example, ::ffff:0:168.0.0.1, ::ffff:168.0.0.1 and ::168.0.0.1 are all valid IPv6 addresses in their recommended textual representations and inet_ntop comes to that conclusion as well.

And now to my question: The IPv6 address 0:0:0:0:0:ffff:0.0.1.1 is, according to inet_ntop, shortened to ::ffff:0:101, choosing the hexa-decimal representation again. What is the reason for this behavior? I would think that because we have a special address prefix here, the dotted decimal notation would be used regardless of the fact the the first two bytes of the last 4 byte block are zero, and therefore writing it as ::ffff:0.0.1.1. Am I misunderstanding the RFC recommendation or is inet_ntop not consistent in this regard? I observed that inet_ntop is in all other of my testcases very much RFC-conform.

I hope you can help me.

Edit: After some more testing, it does seem that inet_ntop does always choose the hexa-decimal notation over the dotted decimal one, even though the IPv6 address is actually just some of those special embedded IPv4 addresses, if the first two bytes of that last 4 byte block are zero.

Actually, the IPv4-Compatible addressing (::x.x.x.x/96) has been deprecated by RFC 4291: "The "IPv4-Compatible IPv6 address" is now deprecated because the current IPv6 transition mechanisms no longer use these addresses." Only the IPv4-Mapped (::ffff:x.x.x.x/96) and IPv4-Embedded addressing are still valid, but it is very difficult to determine embedded IPv4 addressing. — Ron Maupin

ilkkachu ilkkachu · Accepted Answer · 2021-03-04T17:49:59

FWIW, on my Ubuntu (glibc 2.31), I get:

::ffff:0:168.0.0.1      -> ::ffff:0:a800:1      *
::ffff:168.0.0.1        -> ::ffff:168.0.0.1
::168.0.0.1             -> ::168.0.0.1
0:0:0:0:0:ffff:0.0.1.1  -> ::ffff:0.0.1.1       *

where the two marked ones seem to differ from your results, showing that this implementation doesn't recognize the ::ffff:0 prefix, but doesn't dislike the embedded 0.0.1.1 that much.

However, with the all-zeroes prefix, I get a similar result: if the top two bytes of the IPv4 address are zero, the output format changes:

::1.2.3.4           -> ::1.2.3.4
::0.2.3.4           -> ::0.2.3.4
::0.0.3.4           -> ::304
::0.0.0.4           -> ::4

That's probably because ::1 shouldn't turn into ::0.0.0.1, but it means there must be some line drawn for what IPv4 addresses are shown in mixed notation, at least with the all-zero prefix.

So, could this just be a bug or quirk of the library you have? If it does show similar behaviour with both the all-zeroes prefix and other prefixes, perhaps they've just decided to go the easy way and treat all of them equally. As far as I can figure out, the whole 0/8 block is still reserved for "Local Identification", anyway, so I wonder if addresses like 0.0.x.x even come up embedded in IPv6. (I don't know, though.)

The RFC also specifies the mixed notation only as "RECOMMENDED", so we can't really say the implementation you have is wrong.

RFC-recommended representation of IPv6 addresses and inet_ntop in C

1 Answers