I have 2 dataframes df1 & df2. The df2 dataframes is a subset of df1 I extracted to do some cleaning. Both dataframes can be matched on index. I seen a lot of merges on the site. I don't want to add more columns to df1 and the dataframes are not the same size df1 has 1000 rows and df2 has 275 rows so i don't want to replace the entire column. I want to update df1['AgeBin'] with the df2['AgeBin'] values where the indexes for these dataframes have a match.
indexes = df.loc[df.AgeBin.isin(dfage_test.AgeBin.values)].index
df1.at[indexes,'AgeBin'] = df2['AgeBin'].values
This was what I came up with but seems like there is an issue since df's are different sizes
ValueError: Must have equal len keys and value when setting with an iterable
Below is an over simplification. df1 has 26 columns and df2 has 12 columns the Agebin is the last column in both dfs. This is in theory what my objective is
df2
AgeBin
0 2
1 3
2 1
3 3
df1
AgeBin
0 NaN
1 NaN
2 NaN
3 NaN
df1 after update
AgeBin
0 2
1 3
2 1
3 3
Here are dataframe specs
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 1046 non-null float64
1 Survived 714 non-null category
2 Pclass 1046 non-null category
3 Name 1046 non-null object
4 Sex 1046 non-null object
5 Age 1046 non-null float64
6 SibSp 1046 non-null float64
7 Parch 1046 non-null float64
8 Ticket 1046 non-null object
9 Fare 1046 non-null float64
10 Embarked 1046 non-null category
11 FamilySize 1046 non-null float64
12 Surname 1046 non-null object
13 Title 1046 non-null object
14 IsChild 1046 non-null float64
15 isMale 1046 non-null category
16 GroupID 1046 non-null float64
17 GroupSize 1046 non-null float64
18 GroupType 1046 non-null object
19 GroupNumSurvived 1046 non-null float64
20 GroupNumPerished 1046 non-null float64
21 LargeGroup 1046 non-null float64
22 SplitFare 1046 non-null float64
23 log10Fare 1046 non-null float64
24 log10SplitFare 1046 non-null float64
25 AgeBin 1046 non-null category
dtypes: category(5), float64(15), object(6)
memory usage: 221.9+ KB
dfageResults.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 263 entries, 5 to 1308
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 AgeBin 263 non-null category
dtypes: category(1)
memory usage: 12.4 KB
Here are the categories
67] dfageResults.groupby(["AgeBin"])["AgeBin"].count()
AgeBin
0-14 25
15-29 192
30-44 46
Name: AgeBin, dtype: int64
[68] df.groupby(["AgeBin"])["AgeBin"].count()
AgeBin
0-14 107
15-29 462
30-44 301
45-59 136
60+ 40
Name: AgeBin, dtype: int64
df1.combine_first(df2)
– moys