Python Pandas- Merging two data frames based on an index order

Question

I have two pandas data frames. The first is:

df1 = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"]})

The second data frame is:

df2 = pd.DataFrame({"val1" : ["A1","A1","A1","B2","B2","B2"],
                    "val2" : [10, 13, 16, 11, 20, 22]})

I would like to merge the two together in a way in which the row ordering from df1 is used and the values from df2 follow this ordering. Ideally, I would like it to look like this:

df_final = pd.DataFrame({"val1" : ["B2","A1","B2","A1","B2","A1"],
                         "val2" : [11, 10, 20, 13, 22, 16]})

I've tried using the merge function with left_on and right_on, but I don't get the output I'm looking for. Any help would be greatly appreciated.

I don't think this problem is well-defined, is it? Val1 is not unique. So it seems like the requested result depends on some implicit ordering assumption within A1 and B2 groupings. It would be a good idea to make that assumption explicit here. And I'm skeptical this is really a good overall way to handle data to be honest. — JohnE

MaxU MaxU · Accepted Answer · 2016-04-03T20:51:11

You can do it this way:

sort values in df2 by ['val1', 'val2'], group it by val1 and store it as g2?
add idx column to df1 which will be used in order to pick values from df2

Code:

In [176]: df1['idx'] = 1

In [177]: df1['idx'] = df1.groupby('val1')['idx'].cumsum()-1

In [178]: df1
Out[178]:
  val1  idx
0   B2    0
1   A1    0
2   B2    1
3   A1    1
4   B2    2
5   A1    2

In [179]: g2 = df2.sort_values(['val1', 'val2']).groupby('val1')

In [180]: g2.groups
Out[180]: {'A1': [0, 1, 2], 'B2': [3, 4, 5]}

In [181]: df2.iloc[g2.groups['A1'][1]]
Out[181]:
val1    A1
val2    13
Name: 1, dtype: object

In [182]: df1.apply(lambda x: df2.iloc[g2.groups[x['val1']][x['idx']]], axis=1)
Out[182]:
  val1  val2
0   B2    11
1   A1    10
2   B2    20
3   A1    13
4   B2    22
5   A1    16

Python Pandas- Merging two data frames based on an index order

2 Answers