0
votes

Using a simple protobuf definition like so:

syntax = "proto3";

package flowprotobuf;

message FlowMessage {
    bytes SourceIP = 6;
    bytes DestIP = 7;
}

What is the proper way to encode an IPv4/IPv6 address so as to insert it into a Clickhouse table with FixedString(16) types set for both SourceIP and DestIP?

After days of riding the struggle bus I'm currently doing the following (Python 3) to dump the protobuf stream to a Kafka topic (and then consuming via Clickhouse Kafka engine & materialized view) with a "good" result:

#!/usr/bin/env python

import flow_pb2
from google.protobuf.internal.encoder import _VarintBytes
from socket import inet_pton, AF_INET, AF_INET6
from binascii import hexlify

def pack_addr(ipaddr):
    if ':' in ipaddr:
        l = int(hexlify(inet_pton(AF_INET6, ipaddr)), 16)
        return l.to_bytes(16, byteorder='big')
    else:
        l = int(hexlify(inet_pton(AF_INET, ipaddr)), 16)
        return l.to_bytes(16, byteorder='big')

fm = flow_pb2.FlowMessage()
fm.SourceIP = pack_addr(ip_src)
fm.DestIP = pack_addr(ip_dst)

size = fm.ByteSize()
fpb = _VarintBytes(size) + fm.SerializeToString()

producer.produce(kafka_producer_topic, fpb)
producer.poll(0)

I put good in quotes because according to the Clickhouse documentation for IPv6NumToString():

Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format. IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44.

However my query results are not showing the ::ffff:x.x.x.x format - instead:

de33137dfc80 :) SELECT Date,TimeReceived,IPv6NumToString(SourceIP),IPv6NumToString(DestIP) FROM test LIMIT 5;

SELECT
    Date,
    TimeReceived,
    IPv6NumToString(SourceIP),
    IPv6NumToString(DestIP)
FROM test
LIMIT 5

┌───────Date─┬────────TimeReceived─┬─IPv6NumToString(SourceIP)─┬─IPv6NumToString(DestIP)─┐
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211          │ ::202.122.147.98        │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211          │ ::217.118.23.125        │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::192.34.21.69            │ ::104.34.73.41          │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211          │ ::194.28.167.103        │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.148.89           │ ::79.170.71.49          │
└────────────┴─────────────────────┴───────────────────────────┴─────────────────────────┘

5 rows in set. Elapsed: 0.006 sec.

I know the IPv4 addresses are correct, and it's also properly showing IPv6 addresses. I just want to make sure I'm not missing anything glaring/obvious. Thanks.

Edited to add: Clickhouse server version 20.5.4 revision 54435

EDIT 2: Denis's suggestion below led me to come up with the solution:

    else:
        m = '::ffff:' + ipaddr
        l = int(hexlify(inet_pton(AF_INET6, m)), 16)
        return l.to_bytes(16, byteorder='big')
1

1 Answers

1
votes
SELECT hex(IPv6StringToNum('::98.158.157.211'))

┌─hex(IPv6StringToNum('::98.158.157.211'))─┐
│ 000000000000000000000000629E9DD3         │
└──────────────────────────────────────────┘

SELECT hex(IPv6StringToNum('::ffff:98.158.157.211'))

┌─hex(IPv6StringToNum('::ffff:98.158.157.211'))─┐
│ 00000000000000000000FFFF629E9DD3              │
└───────────────────────────────────────────────┘

https://en.wikipedia.org/wiki/IPv6_address For example, the IPv4-mapped IPv6 address ::ffff:c000:0280 is written as ::ffff:192.0.2.128, thus expressing clearly the original IPv4 address that was mapped to IPv6

https://www.ultratools.com/tools/ipv4toipv6 0:0:0:0:0:ffff:629e:9dd3

Converting IPv4 Address to a Hex IPv6 Address in Python