I am a senior developer but new to Pig.
We have a use case to construct a metric in Pig Latin as follows
Count of Customers who (purchased items month AND purchased items prior month) / Count of customers who purchased items in prior month
First step would seem to be to generate the customer counts with FOREACH GROUP GENERATE COUNT(Purchases); and write it to a file, then read it back in again
When i read the data back in again, is there a way in a for each to compare the current row (which would now be an aggregate count by month) and the previous row
Possibly the data should be pivoted before the data is written out and read back in again, and each column compared to the 'previous' going left to right instead of row by row?
can a case statement in pig have something like this
case (customerboolean_has_sales_february + customerboolean_has_sales_january)
2 countsalesfeb+ countalesjan/countsalesjanuary 1 null 0 null
Customers who rode in month AND rode in prior month / Total customers who rode in prior month