0
votes

I am trying to get data using below query in hive, I have created external table using below script,

CREATE EXTERNAL TABLE Test(
  status string, 
  details string,
  timestamp string)
PARTITIONED BY ( id string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ('separatorChar'=',', 'quoteChar' = '"')
STORED AS INPUTFORMAT 
    'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
    '';

In hive location csv file has below format. two times double quote in each key value.

PASS,"{""Category"": ""Test"", ""COMMENT"": ""reporting period in dataset: ['APR-2018', 'MAY-2018', 'JUN-2018']""}",2019-09-03 13:56:08

select * from Test where id=1;

returns below output. it hsould have double quotes as present in csv file. when I try to open file in excel it shows separate column for 'APR-2018', 'MAY-2018', 'JUN-2018'

PASS,{"Category": "Test", "COMMENT": "reporting period in dataset: ['APR-2018', 'MAY-2018', 'JUN-2018']"},2019-09-03 13:56:08
1

1 Answers

0
votes

In your table creation statement, try to remove the , 'quoteChar' = '\"' and see if that helps you retain the double quotation marks in your data.

If that does not work, you could try to escape the " character in the table creation statement, by writing WITH SERDEPROPERTIES ('separatorChar'=',', 'quoteChar' = '\"') and see how that affects your table.