python - Pandas Dataframe Multiindex Merge

Question

I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
            ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Then either

s1.merge(s2, how='left', left_index=True, right_index=True)

or

s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])

will result in error.

Do I have to do reset_index() on either s1/s2 to make this work?

Thanks

This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route. — Scott Boston

ALollz ALollz · Accepted Answer · 2018-10-12T19:01:13

Seems like you need to use a combination of them.

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

               s1        s2
bar one  0.765385 -0.365508
    two  1.462860  0.751862
baz one  0.304163  0.761663
    two -0.816658 -1.810634
foo one  1.891434  1.450081
    two  0.571294  1.116862
qux one  1.056516 -0.052927
    two -0.574916 -1.197596

python - Pandas Dataframe Multiindex Merge

4 Answers

Output:

`rename_axis`

`concat`