I have a CSV file of the below format
customerid, period, credit, debit
100, jan-2017, 500, 300
100, jan-2017, 300,0
100, feb-2017, 200,100
100, mar-2017, 200,10
200, jan-2017, 100, 200
200, feb-2017,100,200
Now my requirement is to first group by customer id and then group by period and consolidate the transactions and create a hierarchical JSON as below using Apache Pig scripts.
{
{
"customerid": 100,
"periods": [{
"period": "jan-2017",
"transactions": [{"credit": 500,"debit": 300},....]
}, {
"period": "feb-2017",
"transactions": [...]
}, {
"period": "mar-2017",
"transactions": [....]
}]
}, {
"customerid": 200,
"periods": [{
"period": "jan-2017",
"transactions": [.....]
}, {
"period": "feb-2017",
"transactions": [.....]
}]
}
}
I am fairly new to Pig but managed to write the below script
Data = LOAD 'data.csv' USING PigStorage(',') AS (
company_id:chararray,
period:chararray,
debit:chararray,
credit:chararray)
CompanyBag = GROUP Data BY (company_id);
final_trsnactionjson = FOREACH CompanyBag {
ByCompanyId = FOREACH Data {
PeriodBag = GROUP Data BY (period);
IdPeriodItemRoot = FOREACH PeriodBag{
ItemRecords = FOREACH Source GENERATE debit as debit, credit as credit
GENERATE group as period, TOTUPLE(ItemRecords) as transactions;
}
}
GENERATE group as customerid, TOTUPLE(PeriodBag) AS periods;
};
But this is giving me the below error
mismatched input '{' expecting GENERATE
I searched a lot on how to generate nested Json using Pig, but could not find any good pointers. Where am I going wrong? Thanks in advance for the help