0
votes

I was trying to enter the schema of a dataset while using Pig from a JSON file using the JsonLoader.

The format of the data is as:

{
  'cat_a':'some_text',
  'cat_b':{(attribute_name):(attribute_value)}
}

I am trying to describe the schema as:

LOAD 'filename' USING JsonLoader('cat_a:chararray, cat_b:(attribute_name:chararray,attribute_value:int)');

I feel that I'm describing the schema incorrectly for cat_b.

Can someone help out in that? Thanks in advance.

1

1 Answers

0
votes

If your json is of the format

{"recipe":"Tacos","ingredients":[{"name":"Beef"},{"name":"Lettuce"},{"name":"Cheese"}]}

store the above json in test.json

run the below command

a = LOAD '/home/abhijit/Desktop/test.json' USING JsonLoader('recipe:chararray,ingredients: {(name:chararray)}');

dump a;

you will have output as

(Tacos,{(Beef),(Lettuce),(Cheese)},)

if your json is like below format

{"recipe":"Tacos","ingredients":[{"name":"Beef"},{"name":"Lettuce"},{"name":"Cheese"}],"inventor":{"name":"Alex","age":25}}

a = LOAD '/home/abhijit/Desktop/test.json' USING JsonLoader('recipe:chararray,ingredients: {(name:chararray)},inventor: (name:chararray, age:int)');


dump a;

output would be

(Tacos,{(Beef),(Lettuce),(Cheese)},(Alex,25))