
I am trying to read a file in pig.

A = load 'dataviz/TOP500_201511.csv' using  PigStorage(',') as (Rank:int , Previous Rank:int,
First Appearance:int, First Rank:int, Name:chararray,Computer:chararray,Site:chararray,
Manufacturer:chararray,Country:chararray,Year:int,Segment:chararray,Total Cores:int,
Accelerator/Co-Processor Cores:int,Rmax:int,Rpeak:int,Nmax:int,Nhalf:int,Power:int,
Mflops/Watt:int,Architecture:chararray,Processor:chararray,Processor Technology:chararray,
Processor Speed (MHz):int,Operating System:chararray,OS Family:chararray,
Accelerator/Co-Processor:int,Cores per Socket::chararray,Processor Generation:chararray,
System Model:chararray,System Family:chararray,Interconnect Family:chararray,

but I am getting a strange error.

2016-02-06 21:19:50,213 [uber-SubtaskRunner] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input 'Rank' expecting RIGHT_PAREN

Please help.


1 Answers


We can NOT have multi worded field names. Instead of field by name : Previous Rank (which is multi worded) have it as prv_rank or any name as long as its a valid identifier. Likewise for other field names.

Ref : https://pig.apache.org/docs/r0.11.0/basic.html#Data+Types+and+More

Identifiers Identifiers include the names of relations (aliases), fields, variables, and so on. In Pig, identifiers start with a letter and can be followed by any number of letters, digits, or underscores.

Valid identifiers:

A A123 abc_123_BeX_

Invalid identifiers:

A123 abc$ A!B