0
votes

I am working on pig and my data-set look like this

a b c
a e f

I load it in pig like this

data = load 'temp'  as (col1:chararray);

and when I do describe data I get

data: {col1: chararray}

what does this mean? Does this mean that data is bag of tuples of strings or bag of strings? Because I do dump data, I get a bag of tuples.

shouldn't it be data:{(col1:chararray)}? or Are they both same?

1

1 Answers

1
votes

Pig has the funny habit of enclosing a relation with braces. It's easy to confuse this with a bag, but in your case it simply means that the relation data consists of one column.

By the way, with the load statement you have, each line will be one tuple. If you want to have three as your data suggests, you should do something like this:

data = load 'temp' using PigStorage(' ') as (col1:chararray, col2:chararray, col3:chararray);

A describe should then also show something different.