1
votes

I have seen a number of similar questions but cannot find a straightforward solution to my issue.

I am working with a pandas dataframe containing contact information for constituent donors to a nonprofit. The data has Households and Individuals. Most Households have member Individuals, but not all Individuals are associated with a Household. There is no data that links the Household to the container Individuals, so I am attempting to match them up based on other data - Home Street Address, Phone Number, Email, etc.

A simplified version of the dataframe looks something like this:

Constituent Id     Type          Home Street
1234567            Household     123 Main St.
2345678            Individual    123 Main St.
3456789            Individual    123 Main St.
4567890            Individual    433 Elm Rd.
0123456            Household     433 Elm Rd.
1357924            Individual    500 Stack Ln.
1344444            Individual    500 Stack Ln.

I am using groupby in order to group the constituents. In this case, by Home Street. I'm trying to ensure that I only get groupings with more than one record (to exclude Individuals unassociated with a Household). I am using something like:

df1 = df[df.groupby('Home Street').filter(lambda x: len(x)>1)

What I would like to do is somehow export the grouped dataframe to a new dataframe that includes the Household Constituent Id first, then any Individual Constituent Ids. And in the case that there is no Household in the grouping, place the Individual Constituents in the appropriate locations. The output for my data set above would look like:

Household    Individual  Individual
1234567      2345678     3456789
0123456      4567890
             1357924     1344444

I have toyed with iterating through the groupby object, but I feel like I'm missing some easy way to accomplish my task.

2
do you want all the Indvidual columns to be split out regardless ? - Umar.H
Yes, I want individuals to be split out whether they have a Household associated with them or not (last line of my sample output shows something like that). - markf
Split dataframe in rows with individuals and households. Perform groupby on both, than expand the groupd columns in df with indivduals, than join the the two separate dataframes. - Borut Flis

2 Answers

2
votes

This should do it

df['Type'] = df['Type'] + '_' + (df.groupby(['Home Street','Type']).cumcount().astype(str))
df.pivot_table(index='Home Street', columns='Type', values='Constituent Id', aggfunc=lambda x: ' '.join(x)).reset_index(drop=True)

Output

Type Household_0 Individual_0 Individual_1
0        1234567      2345678      3456789
1        0123456      4567890          NaN
2            NaN      1357924      1344444
2
votes

IIUC, we can use groupby agg(list) and some re-shaping using .join & explode

s = df.groupby(["Home Street", "Type"]).agg(list).unstack(1).reset_index(
    drop=True
).droplevel(level=0, axis=1).explode("Household")

df1 = s.join(pd.DataFrame(s["Individual"].tolist()).add_prefix("Indvidual_")).drop(
    "Individual", axis=1
)


print(df1.fillna(' '))

  Household Indvidual_0 Indvidual_1
0   1234567     2345678     3456789
1   0123456     4567890            
2               1357924     1344444

or we can ditch the join and cast Household to your index.

df1 = pd.DataFrame(s["Individual"].tolist(), index=s["Household"])\
                 .add_prefix("Individual_")
print(df1)

          Individual_0 Individual_1
Household                          
1234567        2345678      3456789
0123456        4567890         None
NaN            1357924      1344444