Using a simple protobuf definition like so:
syntax = "proto3";
package flowprotobuf;
message FlowMessage {
bytes SourceIP = 6;
bytes DestIP = 7;
}
What is the proper way to encode an IPv4/IPv6 address so as to insert it into a Clickhouse table with FixedString(16) types set for both SourceIP and DestIP?
After days of riding the struggle bus I'm currently doing the following (Python 3) to dump the protobuf stream to a Kafka topic (and then consuming via Clickhouse Kafka engine & materialized view) with a "good" result:
#!/usr/bin/env python
import flow_pb2
from google.protobuf.internal.encoder import _VarintBytes
from socket import inet_pton, AF_INET, AF_INET6
from binascii import hexlify
def pack_addr(ipaddr):
if ':' in ipaddr:
l = int(hexlify(inet_pton(AF_INET6, ipaddr)), 16)
return l.to_bytes(16, byteorder='big')
else:
l = int(hexlify(inet_pton(AF_INET, ipaddr)), 16)
return l.to_bytes(16, byteorder='big')
fm = flow_pb2.FlowMessage()
fm.SourceIP = pack_addr(ip_src)
fm.DestIP = pack_addr(ip_dst)
size = fm.ByteSize()
fpb = _VarintBytes(size) + fm.SerializeToString()
producer.produce(kafka_producer_topic, fpb)
producer.poll(0)
I put good in quotes because according to the Clickhouse documentation for IPv6NumToString():
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format. IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44.
However my query results are not showing the ::ffff:x.x.x.x format - instead:
de33137dfc80 :) SELECT Date,TimeReceived,IPv6NumToString(SourceIP),IPv6NumToString(DestIP) FROM test LIMIT 5;
SELECT
Date,
TimeReceived,
IPv6NumToString(SourceIP),
IPv6NumToString(DestIP)
FROM test
LIMIT 5
┌───────Date─┬────────TimeReceived─┬─IPv6NumToString(SourceIP)─┬─IPv6NumToString(DestIP)─┐
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::202.122.147.98 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::217.118.23.125 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::192.34.21.69 │ ::104.34.73.41 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::194.28.167.103 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.148.89 │ ::79.170.71.49 │
└────────────┴─────────────────────┴───────────────────────────┴─────────────────────────┘
5 rows in set. Elapsed: 0.006 sec.
I know the IPv4 addresses are correct, and it's also properly showing IPv6 addresses. I just want to make sure I'm not missing anything glaring/obvious. Thanks.
Edited to add: Clickhouse server version 20.5.4 revision 54435
EDIT 2: Denis's suggestion below led me to come up with the solution:
else:
m = '::ffff:' + ipaddr
l = int(hexlify(inet_pton(AF_INET6, m)), 16)
return l.to_bytes(16, byteorder='big')