I have seen a number of similar questions but cannot find a straightforward solution to my issue.
I am working with a pandas dataframe containing contact information for constituent donors to a nonprofit. The data has Households and Individuals. Most Households have member Individuals, but not all Individuals are associated with a Household. There is no data that links the Household to the container Individuals, so I am attempting to match them up based on other data - Home Street Address, Phone Number, Email, etc.
A simplified version of the dataframe looks something like this:
Constituent Id Type Home Street
1234567 Household 123 Main St.
2345678 Individual 123 Main St.
3456789 Individual 123 Main St.
4567890 Individual 433 Elm Rd.
0123456 Household 433 Elm Rd.
1357924 Individual 500 Stack Ln.
1344444 Individual 500 Stack Ln.
I am using groupby
in order to group the constituents. In this case, by Home Street. I'm trying to ensure that I only get groupings with more than one record (to exclude Individuals unassociated with a Household). I am using something like:
df1 = df[df.groupby('Home Street').filter(lambda x: len(x)>1)
What I would like to do is somehow export the grouped dataframe to a new dataframe that includes the Household Constituent Id first, then any Individual Constituent Ids. And in the case that there is no Household in the grouping, place the Individual Constituents in the appropriate locations. The output for my data set above would look like:
Household Individual Individual
1234567 2345678 3456789
0123456 4567890
1357924 1344444
I have toyed with iterating through the groupby object, but I feel like I'm missing some easy way to accomplish my task.