3
votes

Say I am having a input file as map.

sample.txt
[1#"anything",2#"something",3#"anotherthing"]
[2#"kish"]
[3#"mad"]
[4#"sun"]
[1#"moon"]
[1#"world"]

Since there are no values with the specified key, I do not want to save it to a file. Is there any conditional statements that i can include with the Store into relation ? Please Help me thro' this, following is the pig script.

A = LOAD 'sample.txt';
B = FOREACH A GENERATE $0#'5' AS temp;
C = FILTER B BY temp is not null;
-- It actually generates an empty part-r-X file
-- Is there any conditional statements i can include where if C is empty, Do not store ?
STORE C INTO '/user/logs/output';

Thanks Am I going wrong somewhere ? Please correct me if I am wrong.

1

1 Answers

1
votes

From Chapter 9 of Programming Pig,

Pig Latin is a dataflow language. Unlike general purpose programming languages, it does not include control flow constructs like if and for.

Thus, it is impossible to do this using just Pig.

I'm inclined to say you could achieve this using a combination of a custom StoreFunc and a custom OutputFormat, but that seems like it would be too much added overhead.

One way to solve this would be to just delete the output file if no records are written. This is not too difficult using embedded Pig. For example, using Python embedding:

from org.apache.pig.scripting import Pig

P = Pig.compile("""
A = load 'sample.txt';
B = foreach A generate $0#'5' AS temp;
C = filter B by temp is not null;
store C into 'output/foo/bar';
""")

bound = P.bind()
stats = bound.runSingle()

if not stats.isSuccessful():
    raise RuntimeError(stats.getErrorMessage())

result = stats.result('C')

if result.getNumberRecords() < 1:
    print 'Removing empty output directory'
    Pig.fs('rmr ' + result.getLocation())