2
votes

I'm new in programming Pig and currently I'm trying to implement my Hadoop jobs with pig. So far my Pig programs work. I've got some output files stored as *.txt with semicolon as delimiter. My problem is that Pig adds parentheses around the tuple's...

Is it possible to store the output in a file without these parentheses? Only storing the values? Maybe by overwriting the PigStorage method with an UDF? Does anyone have a hint for me?

I want to read my output files into a RDBMS (Oracle) without the parentheses.

2
Could you give us an example of what you want to achieve? - Frederic
...As well as what you have tried that is producing the undesired output. - reo katoa
When I use PigStorage function and write my output in a file the tuples or bags are wrapped with the corresponding parentheses. STORE result INTO 'output' USING PigStorage(';'); asdf,"{(asdf,fdsa,60,2)}" fdsa,"{(fdsa,asdf,60,2)}" callerA,"{(callerA,callerB,5,1),(callerA,callerC,100,1)}" callerB,"{(callerB,callerA,5,1)}" callerC,"{(callerC,callerA,100,1)}" - Hans
I would like to get rid of "{("'s and ")}"'s. - Hans
If you do this, you will lose the structure of your data. If that is what you want, then the design of your Pig script should reflect that. How are you expecting the RDBMS to intepret what it finds? There is no way to have Pig omit these braces, but you can use FLATTEN to change the structure of your data and eliminate all tuples and bags. - reo katoa

2 Answers

1
votes

You probably need to write your own custom Storer. See: http://wiki.apache.org/pig/Pig070LoadStoreHowTo.

Shouldn't be too difficult to just write it as a plain CSV or whatever. There's also a pre-existing DBStorage class that you might be able to use to write directly to Oracle if you want.

1
votes

For people who find find this topic first, question is answered here: Remove brackets and commas in output from Pig

use the FLATTEN command in your script like this:

output = FOREACH [variable] GENERATE FLATTEN (($1, $2, $3));<br>
STORE output INTO '[path]' USING PigStorage(,);

notice the second set of parentheses for the output you want to flatten.