0
votes

I am working on a 14 csv files project. 10 of them load correctly in pig. 4 don't.

The issue occures when I precise the type of the columns in the schema : if I load the files with column name, but no type casting (ie : all column default to 'bytearray'), I Have no issue : the data get loaded.

But if I precise the column type (and I am only asking for 'chararray'), I got an 'EOF' exception error. The error seems to randomly appears when a field is empty in a column. The strange thing is that the same file would perfectly load without the type casting, and would not load if I precise the 'chararray' casting. Furthermore I can load empty columns in other csv files (with or without casting the columns).

What could be the origin of that ?

I read somewhere that an hive environment configuration could mess up with pig. I am using Yarn, Mesos, Docker, Marathon : any interferences there ? (but globally the errors happen when I am just using grunt on local mode).

1
can you share sample input file? sometime it happens when you don't have new line at the end of the fileMzf

1 Answers

0
votes

I finally found that I had activated the pig.schematuple options which is an experimental one, and create a bug : the file doesn't load when there are more than 9 columns, and that a cells is empty (it does load empty cells if there are less than 9 columns).

2 working days lost on an experiementation :-s