I am working on a 14 csv files project. 10 of them load correctly in pig. 4 don't.
The issue occures when I precise the type of the columns in the schema : if I load the files with column name, but no type casting (ie : all column default to 'bytearray'), I Have no issue : the data get loaded.
But if I precise the column type (and I am only asking for 'chararray'), I got an 'EOF' exception error. The error seems to randomly appears when a field is empty in a column. The strange thing is that the same file would perfectly load without the type casting, and would not load if I precise the 'chararray' casting. Furthermore I can load empty columns in other csv files (with or without casting the columns).
What could be the origin of that ?
I read somewhere that an hive environment configuration could mess up with pig. I am using Yarn, Mesos, Docker, Marathon : any interferences there ? (but globally the errors happen when I am just using grunt on local mode).