1
votes

I want to get the index of \ from a file

It enters into a state (with >> prompt) from which I have to do Ctrl+z to come out of pig and then have to explicitly go to pig again

The code I use is

grunt> A = LOAD 'data.txt' USING PigStorage(',') AS (username:chararray, address:char array);

grunt> B = FOREACH A GENERATE INDEXOF(username, '\', 0);

>>

But when I try the same thing for other chars, I get the output The code being

grunt> A = LOAD 'data.txt' USING PigStorage(',') AS (username:chararray, address:char array);

grunt> B = FOREACH A GENERATE INDEXOF(username, 'a', 0);

grunt> DUMP B;

Output:

-1
-1
.
.
.

It even works for chars /, <, > and most of the chars I have tried. Just not for \ Please suggest a solution.

Thank you.

3

3 Answers

1
votes

\ is a metacharacter in Java strings. It allows you to specify characters like tab (\t), newline (\n), and more. It also allows you to embed a single quote, so that it doesn't terminate the string. For example, to make a string consisting of a single quote in Pig, you would use '\''. Hence, in your example, you are "escaping" the single quote, and Pig is trying to interpret everything after it as part of your string.

Now, because you might want to have a backslash in a string, you can also escape it. Try using '\\' and you should get the output you are looking for.

1
votes

I tried using '\' too, used the unicode character for \ None of them worked.

I suggest using the function REGEX

0
votes

In Pig 0.13.0 Even if we try '\\' it goes into listening mode e.g.

grunt> A = LOAD 'data.txt' USING PigStorage(',') AS (username:chararray, address:char array);

grunt>-- This will go into listening mode

grunt> B = FOREACH A GENERATE INDEXOF(username, '\\', 0);

>>

So tried using unicode character '\u005C' instead of '\' symbol for getting index

grunt> B = FOREACH A GENERATE INDEXOF(username, '\\u005C', 0); This is also not working.

So tried another alternative from bit of hack.

grunt> B = FOREACH A GENERATE INDEXOF(username, TRIM('\\ '), 0);

This is working and the previous issue of going into listening mode as '\\' is escaping the ' which makes it to go into listening mode thus using space and trim it got resolved.