Are there any Serde available to support hive table with Unicode characters. We might have file in either UTF-8, UTF-18 and UTF-32.Which is nothing but we are looking for support different languages like Japanese, Chinese in hive table. We should be able to load different language data into hive table
0
votes
1 Answers
0
votes
Hive could only read and write UTF-8 text files.
for other character set,It should be converted into UTF-8.
Syntax for conversion is
hive> CREATE TABLE mytable(name, datatype) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES("serialization.encoding"='FORMAT');
conversion can be done using iconv but it supports only files smaller than 16G. syntax:
>iconv -f encoding -t encoding inputfile