0
votes

In the following Pig script, my value ct "disappears" when I run a DUMP on any step after performing the generate that sets the e3 alias. For example, if I execute a DUMP on e4 immediately after setting the alias, no value is returned.

I will also see the following warning in my output:

[main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 9 time(s).

   eng_grp = GROUP engs BY (aid, scm_id,ts,etype);
   eng_grp_out = FOREACH eng_grp
               GENERATE
                   group.aid as aid,
                   group.scm_id as scm_id,
                   group.etype as etype,
                   group.ts as timestamp,
                   (long)COUNT_STAR(engs) as ct;

   eng_joined = JOIN eng_grp_out BY (aid,scm_id), tgc BY (aid, scm_id);

   e3 = FOREACH eng_joined GENERATE
         MD5((chararray)CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(eng_grp_out::aid,'_'),eng_grp_out::scm_id),'_'),eng_grp_out::etype),'_'),(chararray)eng_grp_out::timestamp)) as id,
         eng_grp_out::aid as v,
         eng_grp_out::scm_id as scmid,
         eng_grp_out::etype AS et,
         eng_grp_out::timestamp as ts,
         FLATTEN(tgc::tags),
         eng_grp_out::ct as ct;

   -- the value for "ct" will be output if I do DUMP e3; here

   e4 = FOREACH e3 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         FLATTEN(tgc::tags::g) as gg,
         ct;
   -- the value for "ct" will be NOT be output if I do DUMP e4; here
   e5 = FOREACH e4 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         gg#'g' as tg,
         gg#'v' as tv,
         gg#'d' as td,
         ct;

   e6 = FOREACH e5 GENERATE
         id,
         v,
         scmid,
         et,
         (long)ts,
         tg#'\$oid' as tg,
         tv#'\$oid' as tv,
         (chararray)td as td,
         ct;

   e7 = FOREACH e6 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         'c' as tt,
         tg,
         tv,
         td,
         ct;

   e8 = FOREACH e7 GENERATE
         id,v,scmid,et,ts,tt,
         CONCAT(CONCAT(CONCAT(CONCAT(tg,'_'),tv),'_'),td) as ct,
         tg,tv,td,ct;
1
What is the output of describe e4 ?LiMuBei
DESCRIBE e3 produces e3: {id: chararray,v: chararray,scmid: bytearray,et: chararray,ts: long,tagged_content::tags::g: map[],tagged_content::tags::v: map[],tagged_content::tags::d: chararray,tagged_content::tags::n: chararray,ct: long}Michael DeLorenzo
DESCRIBE e4 produces e4: {id: chararray,v: chararray,scmid: bytearray,et: chararray,ts: long,gg: map[],ct: long}Michael DeLorenzo
Does it work if you do not use an alias?LiMuBei
I was able to finally get it to work by changing the assignment of the e3 alias to e3 = FOREACH eng_joined GENERATE ...kept everything else the same, TOMAP('count_val', (long)eng_grp_out::ct);. From there I was able to get the value I needed in the e4 assignment by doing (long)$6#'count_val' as val.Michael DeLorenzo

1 Answers

0
votes

I was able to finally get it to work by changing the assignment of the e3 alias to

e3 = FOREACH eng_joined GENERATE //...kept everything else the same... TOMAP('count_val', (long)eng_grp_out::ct);

From there I was able to get the value I needed in the e4 assignment by doing (long)$6#'count_val' as val.