Are there any Serde available to support hive table with Unicode characters. We might have file in either UTF-8, UTF-18 and UTF-32.Which is nothing but we are looking for support different languages like Japanese, Chinese in hive table. We should be able to load different language data into hive table
Hive could only read and write UTF-8 text files.
for other character set,It should be converted into UTF-8.
Syntax for conversion is
hive> CREATE TABLE mytable(name, datatype) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES("serialization.encoding"='FORMAT');
conversion can be done using iconv but it supports only files smaller than 16G. syntax:
>iconv -f encoding -t encoding inputfile