0
votes

Are there any Serde available to support hive table with Unicode characters. We might have file in either UTF-8, UTF-18 and UTF-32.Which is nothing but we are looking for support different languages like Japanese, Chinese in hive table. We should be able to load different language data into hive table

1

1 Answers

0
votes

Hive could only read and write UTF-8 text files.
for other character set,It should be converted into UTF-8.
Syntax for conversion is

hive> CREATE TABLE mytable(name, datatype) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES("serialization.encoding"='FORMAT');

conversion can be done using iconv but it supports only files smaller than 16G. syntax:

>iconv -f encoding -t encoding inputfile