15
votes

In Oracle, what is the appropriate data type or technique for representing network addresses, which addresses may be IPv4 or IPv6?

Background: I'm converting a table recording network activity, built using the PostgreSQL inet data type to hold both v4 and v6 addresses in the same table.

No row contains both v4 and v6 addresses, however. (That is, a record is either from a machine's v4 stack, or a machine's v6 stack.)

7
Do you make aggregations/search on that table by IP address? how many rows do you expect to have in 1 year?jachguate
Aggregations and searching by IP, yes. Number of rows, possibly hundreds of millions. (Do you have small/medium/large recommendations?)pilcrow
please, use the @jachguate notation in your comments if you want me to get notified about it. For what you say, I think the better approach is the first of @Alain answer.jachguate

7 Answers

16
votes

In Oracle, what is the appropriate data type or technique for representing network addresses, which addresses may be IPv4 or IPv6

There are two approaches :

  1. storing only.
  2. storing the conventional representation

For storing only. An IPV4 address should be an integer (32bits are enough). For IP V6, 128 bits, INTEGER (which is similar to Number(38)) will do. Of course, that's storing. That approach takes the view that the representation is a matter for the application.

If one take the opposite strategy, of storing the conventional representation, one needs to make sure that IP V4 and IPV6 addresses have only one conventional (string) representation. It's well-known for ipV4. As for IPV6, there is also a standard format.

My preference goes to the first strategy. In the worst case, you can adopt an hybrid approach (non acid though) and store both the binary and the ascii representation side by side with "priority" to the binary value.

No row contains both v4 and v6 addresses, however.

The standard representation of a IPV4 address in IPV6 format is : ::ffff:192.0.2.128.

I don't know the context but I would however reserve 2 columns, one for IPV4 and the other for a distinct ipV6 address.

Update
Following a good comment by @sleepyMonad's, I'd like to point out that instead of the Number data type it is preferable to use the INTEGER data type, which will happily accommodate the highest possible value that can be expressed with a 128 bits integer 'ff...ff' (which would need 39 decimal digits). 38 is the highest power of ten ranging from 0 to 9 that can be encoded on 128 bits but one can still insert the maximum unsigned value for 2**128 - 1 (decimal 340282366920938463463374607431768211455). Here is a small test to illustrate this possibility.

create table test (
  id integer primary key,
  ipv6_address_bin INTEGER );

-- Let's enter 2**128 - 1 in the nueric field
insert into test (id, ipv6_address_bin) values ( 1, to_number ( 'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF', 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') ) ;

-- retrieve it to make sure it's not "truncated".
select to_char ( ipv6_address_bin, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' ) from test where id = 1 ;
-- yields 'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF'

select to_char ( ipv6_address_bin ) from test where id = 1 ;
-- yields 340282366920938463463374607431768211455

select LOG(2, ipv6_address_bin) from test where id = 1 ;
-- yields 128

select LOG(10, ipv6_address_bin) from test where id = 1 ;
-- yields > 38
8
votes

Store it in RAW.

RAW is variable-length byte array, so....

  • just treat the IPv4 as an array of 4 bytes
  • and IPv6 as an array of 16 bytes

...and store either one of them in directly in RAW(16).


RAW can be indexed, be a PK, UNIQUE or FOREIGN KEY, so you can do anything you normally could with VARCHAR2 or INT/NUMBER/DECIMAL, but with less conversion and storage overhead.

To illustrate the storage overhead of INT over RAW, consider the following example:

CREATE TABLE IP_TABLE (
    ID INT PRIMARY KEY,
    IP_RAW RAW(16), 
    IP_INT INT
);

INSERT INTO IP_TABLE (ID, IP_RAW, IP_INT) VALUES (
    1,
    HEXTORAW('FFFFFFFF'),
    TO_NUMBER('FFFFFFFF', 'XXXXXXXX')
);

INSERT INTO IP_TABLE (ID, IP_RAW, IP_INT) VALUES (
    2,
    HEXTORAW('FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF'),
    TO_NUMBER('FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF', 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
);

SELECT VSIZE(IP_RAW), VSIZE(IP_INT), IP_TABLE.*  FROM IP_TABLE;

The result (under Oracle 10.2):

table IP_TABLE created.
1 rows inserted.
1 rows inserted.
VSIZE(IP_RAW)          VSIZE(IP_INT)          ID                     IP_RAW                           IP_INT                 
---------------------- ---------------------- ---------------------- -------------------------------- ---------------------- 
4                      6                      1                      FFFFFFFF                         4294967295             
16                     21                     2                      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 340282366920938463463374607431768211455 
4
votes

@Alain Pannetier (because I can't comment yet): The ANSI INTEGER datatype maps to NUMBER(38) in Oracle according to http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#i54335. Below the table you find the info that NUMBER only provides 126bit binary precision which is not enought for a 128bit IPv6 address. The maximum value might store fine but there will be addresses that are rouned to the next lower one.

The internal numeric format is ROUND((length(p)+s)/2))+1 (http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#i16209).

Update: After fiddling around with the issue again I've now found a solution that allows high performance querying of networks that contain an IPv6 address: store the IPv6 addresses and subnet masks in RAW(16) columns and compare them using UTL_RAW.BIT_AND:

SELECT name, DECODE(UTL_RAW.BIT_AND('20010DB8000000000000000000000001', ipv6_mask), ipv6_net, 1, 0)
FROM ip_net
WHERE ipv6_net IS NOT NULL;
1
votes

you could also use a custom oracle object.

SQL>set SERVEROUTPUT on
SQL>drop table test;

Table dropped.

SQL>drop type body inaddr;

Type body dropped.

SQL>drop type inaddr;

Type dropped.

SQL>create type inaddr as object
  2  ( /* TODO enter attribute and method declarations here */
  3  A number(5),
  4  B number(5),
  5  C number(5),
  6  D number(5),
  7  E number(5),
  8  F number(5),
  9  G number(5),
 10  H NUMBER(5),
 11  MAP MEMBER FUNCTION display RETURN VARCHAR2,
 12  MEMBER FUNCTION toString( SELF IN INADDR , CONTRACT BOOLEAN DEFAULT TRUE) RETURN VARCHAR2,
 13  CONSTRUCTOR FUNCTION INADDR(SELF IN OUT NOCOPY INADDR, INADDRASSTRING VARCHAR2)  RETURN SELF AS RESULT
 14  
 15  ) NOT FINAL;
 16  /

SP2-0816: Type created with compilation warnings

SQL>
SQL>
SQL>CREATE TYPE BODY INADDR AS
  2  
  3  MAP MEMBER FUNCTION display RETURN VARCHAR2
  4  IS BEGIN
  5  return tostring(FALSE);
  6  END;
  7  
  8  
  9  MEMBER FUNCTION TOSTRING( SELF IN  INADDR , CONTRACT BOOLEAN DEFAULT TRUE) RETURN VARCHAR2 IS
 10  IP4 VARCHAR2(6) := 'FM990';
 11  ip6 varchar2(6) := 'FM0XXX';
 12    BEGIN
 13  IF CONTRACT THEN
 14    ip6 := 'FMXXXX';
 15  end if;
 16  
 17  IF CONTRACT AND A =0 AND B=0 AND C = 0 AND D=0 AND E =0 AND F = 65535 THEN --ipv4
 18      RETURN  '::FFFF:'||TO_CHAR(TRUNC(G/256),'FM990.')||TO_CHAR(MOD(G,256),'FM990.')||TO_CHAR(TRUNC(H/256),'FM990.')||TO_CHAR(MOD(H,256),'FM990');
 19  ELSE
 20      RETURN
 21  TO_CHAR(A,ip6)||':'||
 22  TO_CHAR(B,IP6)||':'||
 23  TO_CHAR(C,ip6)||':'||
 24  TO_CHAR(D,ip6)||':'||
 25  TO_CHAR(E,ip6)||':'||
 26  TO_CHAR(F,ip6)||':'||
 27  TO_CHAR(G,ip6)||':'||
 28  TO_CHAR(H,ip6);
 29  end if;
 30    end;
 31  
 32      CONSTRUCTOR FUNCTION inaddr(SELF IN OUT NOCOPY inaddr, inaddrasstring VARCHAR2)
 33                                 RETURN SELF AS RESULT IS
 34      begin
 35          if instr(inaddrasstring,'.') > 0 then
 36            --ip4
 37  null;
 38              a := 0;
 39              B := 0;
 40              C := 0;
 41              D := 0;
 42              E := 0;
 43              F := TO_NUMBER('FFFF', 'XXXX');
 44              G := TO_NUMBER(TO_CHAR(TO_NUMBER(REGEXP_SUBSTR(INADDRASSTRING,'([0-9]{1,3}).',1,1,'i',1),'999'),'FM0X')
 45  ||TO_CHAR(TO_NUMBER(REGEXP_SUBSTR(INADDRASSTRING,'([0-9]{1,3}).',1,2,'i',1),'999'),'FM0X')
 46  ,'XXXX');
 47              h := TO_NUMBER(TO_CHAR(TO_NUMBER(REGEXP_SUBSTR(INADDRASSTRING,'([0-9]{1,3}).',1,3,'i',1),'999'),'FM0X')
 48  ||TO_CHAR(TO_NUMBER(REGEXP_SUBSTR(INADDRASSTRING,'([0-9]{1,3})',1,4,'i',1),'999'),'FM0X')
 49  ,'XXXX');
 50  
 51          ELSIF instr(inaddrasstring,':') > 0 then
 52              --ip6
 53              a := TO_NUMBER(REGEXP_SUBSTR(inaddrasstring,'([0-9a-fA-F]{1,4})',1,1,'i',1),'XXXX');
 54              b := TO_NUMBER(REGEXP_SUBSTR(inaddrasstring,'([0-9a-fA-F]{1,4})',1,2,'i',1),'XXXX');
 55              c := TO_NUMBER(REGEXP_SUBSTR(inaddrasstring,'([0-9a-fA-F]{1,4})',1,3,'i',1),'XXXX');
 56              d := TO_NUMBER(REGEXP_SUBSTR(inaddrasstring,'([0-9a-fA-F]{1,4})',1,4,'i',1),'XXXX');
 57              E := TO_NUMBER(REGEXP_SUBSTR(inaddrasstring,'([0-9a-fA-F]{1,4})',1,5,'i',1),'XXXX');
 58              f := TO_NUMBER(REGEXP_SUBSTR(inaddrasstring,'([0-9a-fA-F]{1,4})',1,6,'i',1),'XXXX');
 59              g := TO_NUMBER(REGEXP_SUBSTR(inaddrasstring,'([0-9a-fA-F]{1,4})',1,7,'i',1),'XXXX');
 60              H := TO_NUMBER(REGEXP_SUBSTR(inaddrasstring,'([0-9a-fA-F]{1,4})',1,8,'i',1),'XXXX');
 61          end if;
 62  
 63          RETURN;
 64      END;
 65  end;
 66  /

Type body created.

SQL>
SQL>create table test
  2  (id integer primary key,
  3  address inaddr);

Table created.

SQL>
SQL>select * from test;

no rows selected

SQL>
SQL>
SQL>insert into test values (1, INADDR('fe80:0000:0000:0000:0202:b3ff:fe1e:8329') );

1 row created.

SQL>INSERT INTO TEST VALUES (2, INADDR('192.0.2.128') );

1 row created.

SQL>insert into test values (3, INADDR('20.0.20.1') );

1 row created.

SQL>insert into test values (4, INADDR('fe80:0001:0002:0003:0202:b3ff:fe1e:8329') );

1 row created.

SQL>insert into test values (5, INADDR('fe80:0003:0002:0003:0202:b3ff:fe1e:8329') );

1 row created.

SQL>INSERT INTO TEST VALUES (6, INADDR('fe80:0003:0001:0003:0202:b3ff:fe1e:8329') );

1 row created.

SQL>INSERT INTO TEST VALUES (7, INADDR('fe80:0003:0001:0003:0202:b3ff:fe1e:8328') );

1 row created.

SQL>INSERT INTO TEST VALUES (8, INADDR('dead:beef:f00d:cafe:dea1:aced:b00b:1234') );

1 row created.

SQL>
SQL>COLUMN INET_ADDRESS_SHORT FORMAT A40
SQL>column inet_address_full format a40
SQL>
SQL>select t.address.toString() inet_address_short, t.address.display( ) inet_address_full
  2  from test T
  3  order by t.address ;

INET_ADDRESS_SHORT                       INET_ADDRESS_FULL
---------------------------------------- ----------------------------------------
::FFFF:20.0.20.1                         0000:0000:0000:0000:0000:FFFF:1400:1401
::FFFF:192.0.2.128                       0000:0000:0000:0000:0000:FFFF:C000:0280
DEAD:BEEF:F00D:CAFE:DEA1:ACED:B00B:1234  DEAD:BEEF:F00D:CAFE:DEA1:ACED:B00B:1234
FE80:0:0:0:202:B3FF:FE1E:8329            FE80:0000:0000:0000:0202:B3FF:FE1E:8329
FE80:1:2:3:202:B3FF:FE1E:8329            FE80:0001:0002:0003:0202:B3FF:FE1E:8329
FE80:3:1:3:202:B3FF:FE1E:8328            FE80:0003:0001:0003:0202:B3FF:FE1E:8328
FE80:3:1:3:202:B3FF:FE1E:8329            FE80:0003:0001:0003:0202:B3FF:FE1E:8329
FE80:3:2:3:202:B3FF:FE1E:8329            FE80:0003:0002:0003:0202:B3FF:FE1E:8329

8 rows selected.

SQL>spool off

i just put this together in the last hour (and taught myself objects at the same time) so im sure it can be improved upon. if i make updates i'll repost them here

1
votes

I would prefer store IP addresses just in string, in format, returned by SYS_CONTEXT ('USERENV', 'IP_ADDRESS')

In refference of SYS_CONTEXT in 11g are described only default return value length as 256 bytes and does not described return value size for exacly 'IP_ADDRESS' context.

In document Oracle Database and IPv6 Statement of Direction described:

Oracle Database 11g Release 2 supports the standard IPv6 address notations specified by RFC2732. A 128bit IP address is generally represented as 8 groups of 4 hex digits, with the “:” symbol as the group separator. The leading zeros in each group are removed. For example, 1080:0:0:0:8:800:200C:417A would be a valid IPv6 address. One or more consecutive zero fields can optionally be compressed with the “::” separator. For example, 1080::8:800:200C:417A.

From this notes I prefer to make column IP_ADDRESS varchar2(39) to allow store 8 group by 4 digits and 7 separators between this groups.

0
votes

The Oracle documentation does state INTEGER is an alias to NUMBER(38), but that is probably a typo, because the paragraph above it states:

NUMBER(p,s) where: p is the precision... Oracle guarantees the portability of numbers with precision of up to 20 base-100 digits, which is equivalent to 39 or 40 decimal digits depending on the position of the decimal point.

So NUMBER can store 39 to 40 digits, and INTEGER is likely an alias to NUMBER(max precision) instead of NUMBER(38). There is why the example provided works (and it works if you change INTEGER to NUMBER).

0
votes

The possibilities are:

  • Store as string, i.e. VARCHAR2 (example 1080::8:800:200c:417a)
  • Store as numeric value
    • NUMBER data type
    • INTEGER data type
  • Store as RAW value
    • One RAW value, i.e. RAW(4) or RAW(16) for IPv4 or IPv6 respectively
    • 4 x RAW(1) or 8 x RAW(2) for IPv4 or IPv6 respectively

I would recommend to use RAW values because

  • If you use strings then you have to consider different formats of IPv6.

    1080::8:800:200C:417A
    1080::8:800:200c:417a
    1080::8:800:32.12.65.122
    1080:0:0:0:8:800:200C:417A
    1080:0:0:0:0008:0800:200C:417A
    1080:0000:0000:0000:0008:0800:200C:417A
    

    are all legal representations of the same IPv6 IP-Address. Your application needs to enforce a common format for proper usage, e.g. use in WHERE condition.

  • NUMBER/INTEGER values are senseless without conversion to human-readable format. You cannot use INTEGER data type in PL/SQL

    i INTEGER := 2**128-1; -- i.e. ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
    
    -> ORA-06502: PL/SQL: numeric or value error: number precision too large. 
    
  • In case you have to work with subnetting you cannot use function BITAND - it also supports numbers only up to 2^127

  • You can use UTL_RAW functions UTL_RAW.BIT_AND, UTL_RAW.BIT_COMPLEMENT, UTL_RAW.BIT_OR for subnet operations.

  • In case you have to deal with really big amount of data (I am talking about billions of rows) it might be beneficial to split the IP-Address into several RAW values, i.e. 4 x RAW(1) or 8 x RAW(2). Such columns would be predestinated for Bitmap-Indexes and you would save a lot of disc space and gain performance.