0
votes

I have an external table with JSON data and I am using JsonSerde to populate data into the table. I am properly getting the data populated and when I query the data I am able to see the results correctly.

But,when I use desc command on that table I am getting from deserializer text for all the column comments.

Below is the table creation ddl.

  CREATE EXTERNAL TABLE IF NOT EXISTS my_table (
    field1 string COMMENT 'This is a field1', 
    field2 int COMMENT 'This is a field2', 
    field3 string COMMENT 'This is a field3', 
    field4 double COMMENT 'This is a field4'
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
    Location '/user/uszszb6/json_test/data';

Entries in the data file.

{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}

When I use use the command desc my_table, I get the below output.

 +-----------+------------+--------------------+--+
    | col_name  | data_type  |      comment       |
    +-----------+------------+--------------------+--+
    | field1    | string     | from deserializer  |
    | field2    | int        | from deserializer  |
    | field3    | string     | from deserializer  |
    | field4    | double     | from deserializer  |
    +-----------+------------+--------------------+--+

JsonSerde is not able to capture the comments properly. I have also tried with other JSONSerde like

 org.openx.data.jsonserde.JsonSerDe
 org.apache.hive.hcatalog.data.JsonSerDe
 com.amazon.elasticmapreduce.JsonSerde

But desc command output is same. There is a JIRA ticket for this bug [https://issues.apache.org/jira/browse/HIVE-6681][1]

According to ticket it's resolved in version 0.13, I am using hive 1.2.1 but still I am facing this issue.

Could anyone share your thoughts on resolving this issue.

1

1 Answers

0
votes

Yeah, it looks like it's an hive bug that affects all the Json SerDes, but have you tried using DESCRIBE EXTENDED ?

DESCRIBE EXTENDED my_table;

hive> describe extended  json_serde_test;
OK
browser                 string                  from deserializer   
device_uuid             string                  from deserializer   
custom                  struct<customer_id:string>  from deserializer   

Detailed Table Information  
Table(tableName:json_serde_test,dbName:default, owner:rcongiu,
createTime:1448477902, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:browser, type:string, 
comment:hello), FieldSchema(name:device_uuid, type:string, comment:my 
name is elder price), FieldSchema(name:custom,   
type:struct<customer_id:string>, comment:null)], 
location:hdfs://localhost:9000/user/hive/warehouse/json_serde_test, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat,  
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.openx.data.jsonserde.JsonSerDe, parameters:
{serialization.format=1, mapping.customer_id=Customer ID}), 
bucketCols:[], sortCols:[], parameters:{}, 
skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{numFiles=1, 
transient_lastDdlTime=1448477903, COLUMN_STATS_ACCURATE=true, 
totalSize=128, numRows=0, rawDataSize=0}, viewOriginalText:null,     
viewExpandedText:null, tableType:MANAGED_TABLE) 
Time taken: 0.073 seconds, Fetched: 5 row(s)

Will output a json-ish detailed description that includes comments..kind of hard to read but it is showing me the comments and may be enough for your purposes..or not.