0
votes

I am build an data synchronizer, which capture the data change from MySQL Source, and export the data to hive.

I choose to use Kafka Connect to implement this. I use Debezium as source connector, and confluent hdfs as sink connector.

But the problem is, the Debezium's naming convention for Kafka topic is like:

serverName.databaseName.tableName

In confluent hdfs sink propeties, i have to config the topics the same as Debezium generated:

"topics": "serverName.databaseName.tableName"

Confluent hdfs sink connector will generate path in HDFS like:

/topics/serverName.databaseName.tableName/partition=0

which will definitely cause some problem in HDFS/Hive, since the path contains syntax ., In fact, the external table auto generated by confluent hdfs sink connector failed, due to the path problem.

2020-05-08T00:42:02,717 ERROR [pool-6-thread-31] metastore.RetryingHMSHandler: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6935)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:2050)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
    at com.sun.proxy.$Proxy26.create_table_with_environment_context(Unknown Source)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:14800)
    at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:14784)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
    at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1
    at org.apache.hadoop.fs.Path.initialize(Path.java:263)
    at org.apache.hadoop.fs.Path.<init>(Path.java:254)
    at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:143)
    at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:147)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1852)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1786)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:2035)
    ... 20 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1
    at java.net.URI.checkPath(URI.java:1823)
    at java.net.URI.<init>(URI.java:745)
    at org.apache.hadoop.fs.Path.initialize(Path.java:260)
    ... 26 more

So is there anyway that i can change the Debezium default naming convention for topics, or, can i change the default path that confluent hdfs sink connector generated through the topic name?

1

1 Answers

0
votes

HDFS Connector will replace dots (and dashes) with underscores when creating Hive tables

HDFS itself doesn't care about dots in paths. The problem is that you cannot have a dot after the port, and you have /null in there somehow.

hdfs://localhost:9000./null


is there anyway that i can change the Debezium default naming convention for topics

Solution has nothing to do with Debezium. You can use RegexRouter that is base Apache Kafka Connect library in a transforms config for you source or sink connector, depending on how early you want to "fix" the problem.

You could also write your own transform and put it in Connect's plugin.path