37
votes

I'm changing some tables to store IP addresses as numbers rather than strings. This is simple with IPv4 where the 32 bit address can fit into an integer column. However, an IPv6 address is 128 bits.

The MySQL documentation only shows numeric types up to 64 bits ("bigint").

Should I stick with char/varchar for IPv6? (Ideally I'd like to use the same column for IPv4 and IPv6, so I'd prefer not to do this).

Is there anything better than using two bigint columns? I would prefer not to have to break the value into upper and lower /64 whenever using the address.

I'm using MariaDB 5.1 - if there's a better solution in a later version of MySQL then that would be nice to know, although not helpfully immediately.

[EDIT] Note that I'm after a recommendation for the best way to do this - it's obvious that there are various ways of doing this (including the existing string representation), but which is (in terms of performance) best? (i.e. if someone has done the analysis already, that would save me doing it, or if I'm missing something obvious, that would be great to know too).

3
One of the answers for stackoverflow.com/questions/3455320/… is relevant here, but the question is not at all asking the same thing (it's in fact the opposite question). stackoverflow.com/questions/420680/… however does look like the same question. Thanks for that - I did search, but didn't find that. I have no objection to this being closed as a dupe of that.Tony Meyer
I really did search first, honest! But this is a dupe of stackoverflow.com/questions/1120371/… as well.Tony Meyer
I'm doing some benchmarking to figure this out since there doesn't seem to be any clear generic answer. So far it looks like there's very little performance difference between any storage method (although I may have errors in my benchmarking...). I'll update with results once they are complete.Tony Meyer
When dealing with IPv6 I would always just store and work with the prefix, i.e. the first 64 bits, since the suffix changes often and can be changed at will.Bachsau

3 Answers

49
votes

I found myself asking this question and from all the posts I read never found any performance comparisons. So here's my attempt.

I've created the following tables, populated with 2,000,000 random ip address from 100 random networks.

CREATE TABLE ipv6_address_binary (
    id SERIAL NOT NULL AUTO_INCREMENT PRIMARY KEY,
    addr BINARY(16) NOT NULL UNIQUE
);

CREATE TABLE ipv6_address_twobigints (
    id SERIAL NOT NULL AUTO_INCREMENT PRIMARY KEY,
    haddr BIGINT UNSIGNED NOT NULL,
    laddr BIGINT UNSIGNED NOT NULL,
    UNIQUE uidx (haddr, laddr)
);

CREATE TABLE ipv6_address_decimal (
    id SERIAL NOT NULL AUTO_INCREMENT PRIMARY KEY,
    addr DECIMAL(39,0) NOT NULL UNIQUE
);

Then I SELECT all ip addresses for each network and record the response time. Average response time on the twobigints table is about 1 second while on the binary table it is about one-hundredth of a second.

Here are the queries.

Note:

X_[HIGH/LOW] is the most/least significant 64-bits of X

when NETMASK_LOW is 0 the AND condition is omitted as it always yields true. doesn't affect performance very much.

SELECT COUNT(*) FROM ipv6_address_twobigints
WHERE haddr & NETMASK_HIGH = NETWORK_HIGH
AND laddr & NETMASK_LOW = NETWORK_LOW

SELECT COUNT(*) FROM ipv6_address_binary
WHERE addr >= NETWORK
AND addr <= BROADCAST

SELECT COUNT(*) FROM ipv6_address_decimal
WHERE addr >= NETWORK
AND addr <= BROADCAST

Average response times:

Graph:

http://i.stack.imgur.com/5NJvQ.jpg

BINARY_InnoDB  0.0119529819489
BINARY_MyISAM  0.0139244818687
DECIMAL_InnoDB 0.017379629612
DECIMAL_MyISAM 0.0179929423332
BIGINT_InnoDB  0.782350552082
BIGINT_MyISAM  1.07809265852
4
votes

I have always used either a string or two 64-bit integers. The former in the case where I just want to record it, the latter in the case where I need to do calculations on whether a certain address is contained in a certain network, or even whether two networks overlap.

When storing it as integer, the only option is indeed to split it into two 64-bit numbers. As this makes comparisons more cumbersome, I wouldn't do this unless you need numerical calculations, to see whether an IP falls within a certain network.

I would not be too concerned about performance for storing IPv6 addresses in a string - depending on how many lookups you do for the data. Usually, there are very few, or there is simply very little data. Yes, the storage and lookups are less efficient than with numbers, but it's not much more painful than storing e-mail addresses, person names or usernames.

And why would you not be able to mix IPv4 and IPv6 in string fields? They are easy to distinguish when retrieving one. Their range of possible values do not overlap.

In short: use numbers for checking overlaps, use strings elsewhere. The inefficiency of strings is irrelevant compared to the ease of use.

3
votes

To quote: "Have you considered binary (64)"

Storing very large integers in MySQL