1
votes

I am trying to create an External Table in Hive that fetches json data as a response of Web Service.

The json file data is given below based on which I created External Table Schema:

{
    "location": {
        "name": "Paris",
        "region": "Ile-de-France",
        "country": "France",
        "lat": 48.87,
        "lon": 2.33,
        "tz_id": "Europe/Paris",
        "localtime_epoch": 1486792043,
        "localtime": "2017-02-11 5:47"
    },
    "current": {
        "last_updated_epoch": 1486792043,
        "last_updated": "2017-02-11 05:47",
        "temp_c": 0.0,
        "temp_f": 32.0,
        "is_day": 0,
        "condition": {
            "text": "Mist",
            "icon": "//cdn.apixu.com/weather/64x64/night/143.png",
            "code": 1030
        },
        "wind_mph": 8.1,
        "wind_kph": 13.0,
        "wind_degree": 330,
        "wind_dir": "NNW",
        "pressure_mb": 1019.0,
        "pressure_in": 30.6,
        "precip_mm": 0.0,
        "precip_in": 0.0,
        "humidity": 74,
        "cloud": 0,
        "feelslike_c": -4.0,
        "feelslike_f": 24.8
    }
}

The Create External Table command is given below:

CREATE EXTERNAL TABLE weatherdata (
   location STRUCT<
      name:STRING, 
      region:STRING, 
      country:STRING, 
      lat:FLOAT,
      lon:FLOAT, 
      tz_id:STRING, 
      localtime:STRING>,
   current STRUCT<
      last_updated_epoch:BIGINT, 
      last_updated:STRING, 
      temp_c:FLOAT,
      temp_f:FLOAT, 
      is_day:INT,
      condition:STRUCT<text:STRING, icon:STRING, code:INT>,
      wind_mph:FLOAT, wind_kph:FLOAT, 
      wind_degree:INT, 
      wind_dir:STRING,
      pressure_mb:FLOAT, 
      pressure_in:FLOAT, 
      precip_mm:FLOAT, 
      precip_in:FLOAT,
      humidity:INT, 
      cloud:INT, 
      feelslike_c:FLOAT, 
      feelslike_f:FLOAT>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

I get the Error below:

FAILED: ParseException line 9:3 cannot recognize input near 'current' 'STRUCT' '<' in column specification

I tried creating this External Table with field 'current' as 'curr' just to check and the table got created successfully. As obvious, on loading the 'json' data given above into this table, only 'location' data loaded successfully and 'current' data showed null. Is 'current' a keyword in hive? What is the issue ?

2
You have condition:STRUCT, so why no : on location or current? - OneCricketeer
@cricket_007 ':' is only used for internal structure variables. - Vishal Sharma

2 Answers

2
votes

Yes "CURRENT" is a Reserved Keyword in Hive. You can use them as identifiers by surrounding them using back-tick (`) characters. Refer this documentation on Quoting identifiers in column names.

Here, the create statement would be

....      
       tz_id:STRING, 
          localtime:STRING>,
       `current` STRUCT<
          last_updated_epoch:BIGINT, 
          last_updated:STRING, 
          temp_c:FLOAT,
....

Also on select:

select `current` from weatherdata;
0
votes

Current is the reserved key word. Hence use DDL as below :- use back tick

CREATE EXTERNAL TABLE weatherdata (
   `location` STRUCT<
      name:STRING, 
      region:STRING, 
      country:STRING, 
      lat:FLOAT,
      lon:FLOAT, 
      tz_id:STRING, 
      localtime:STRING>,
   `current` STRUCT<
      last_updated_epoch:BIGINT, 
      last_updated:STRING, 
      temp_c:FLOAT,
      temp_f:FLOAT, 
      is_day:INT,
      condition:STRUCT<text:STRING, icon:STRING, code:INT>,
      wind_mph:FLOAT, wind_kph:FLOAT, 
      wind_degree:INT, 
      wind_dir:STRING,
      pressure_mb:FLOAT, 
      pressure_in:FLOAT, 
      precip_mm:FLOAT, 
      precip_in:FLOAT,
      humidity:INT, 
      cloud:INT, 
      feelslike_c:FLOAT, 
      feelslike_f:FLOAT>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';