I need some help for my pig script. I have 2 csv file and I want to do a join between them with a common id.
customer.csv :
1 ; nom1 ; prenom1
2 ; nom2 ; prenom2
3 ; nom3 ; prenom3
child.csv
1 ; enfant_1_1
2 ; enfant_1_2
3 ; enfant_1_3
1 ; enfant_2_1
1 ; enfant_3_1
So one customer could have many child but a child could have only one "customer".
I want to create this file :
1 ; nom1 ; prenom1 ; enfant_1_1 ; enfant_2_1 ; enfant_3_1
2 ; nom2 ; prenom2 ; enfant_1_2
3 ; nom3 ; prenom3 ; enfant_1_3
This is my method :
First I try do have :
1 ; enfant_1_1 ; enfant_2_1 ; enfant_3_1
2 ; enfant_1_2
3 ; enfant_1_3
And after I will do the join with custome.csv
Tell me I you think there are an easiest way :)
This is my script :
donnees_Enfants = LOAD '/user/cloudera/Jeux/mini_jeu2.csv' USING PigStorage(';')
AS (id_parent:int,nom_enfant:chararray);
group_enfants = GROUP donnees_Enfants BY id_parent;
enfant_uneLigne = foreach group_enfants generate group, donnees_Enfants.nom_enfant;
grunt> echantillon = LIMIT enfant_uneLigne 50;
grunt> DUMP echantillon;
With the DESCRIBE : group_enfants: {group: int,donnees_Enfants: {(id_parent: int,nom_enfant: chararray)}} enfant_uneLigne: {group: int,{(nom_enfant: chararray)}}
The result :
(1,{( enfant_2_1 ),( enfant_1_1 ),( enfant_3_1 )})
(2,{( enfant_2_2 )})
(3,{( enfant_2_3 )})
I tried to flatten "enfant_1_2" ... but the consequences was to had a lign per child... I have some difficulties to play with the tuple and the bags, can you help me ?
Thanks in advance,
Edit : I found a solution to my problem and more ^^ see below
Angelik