0
votes

I have a file that I am trying to load into pig that is compressed with snappy. I set the configuration options in grunt like was described in this jira issue but I am still getting the compressed data in the results.

When I run the job it does say: org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library is available

for the job I do a simple
a = load '/path/to/snappy/file' using PigStorage() as (x, y, z)

then:
dump data

will output the compressed data.

Does anyone know what I can do to read the data correctly? Thanks in advance.

1

1 Answers

1
votes

PigStorage uses PigTextInputFormat for input, which will detect and use Snappy compressed files, but the files must have the correct extension for hadoop the hadoop compression codec factory to know to use snappy.

My guess is your files don't have the .snappy extension, try renaming the files and trying again