0
votes

Setup

Consider the dataframe df

np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(4, 5), columns=list('ABCDE'))
df

          A         B         C         D         E
0  0.444939  0.407554  0.460148  0.465239  0.462691
1  0.016545  0.850445  0.817744  0.777962  0.757983
2  0.934829  0.831104  0.879891  0.926879  0.721535
3  0.117642  0.145906  0.199844  0.437564  0.100702

I want a dataframe where the columns are ranks and each row is ['A', 'B', 'C', 'D', 'E'] in rank order.

Ranks

df.rank(1).astype(int)

   A  B  C  D  E
0  2  1  3  5  4
1  1  5  4  3  2
2  5  2  3  4  1
3  2  3  4  5  1

Expected Results

0  1  2  3  4  5
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D

Further explanation

I want to see each row to show me the column in their rank order. The first row has 'B' first because it had the first ranking in that row of the original data frame.

3

3 Answers

3
votes

Here's one way:

In [90]: df
Out[90]: 
          A         B         C         D         E
0  0.444939  0.407554  0.460148  0.465239  0.462691
1  0.016545  0.850445  0.817744  0.777962  0.757983
2  0.934829  0.831104  0.879891  0.926879  0.721535
3  0.117642  0.145906  0.199844  0.437564  0.100702

In [91]: df2 = df.apply(lambda row: df.columns[np.argsort(row)], axis=1)

In [92]: df2
Out[92]: 
   A  B  C  D  E
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D

The new DataFrame has the same column index as df, but that can be fixed:

In [93]: df2.columns = range(1, 1 + df2.shape[1])

In [94]: df2
Out[94]: 
   1  2  3  4  5
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D

Here's another way. This one converts the DataFrame to a numpy array, applies argsort on axis 1, uses that to index df.columns, and puts the result back into a DataFrame.

In [110]: pd.DataFrame(df.columns[np.array(df).argsort(axis=1)], columns=range(1, 1 + df.shape[1]))
Out[110]: 
   1  2  3  4  5
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D
1
votes

Here's another way.

In [5]: df1 = df.rank(1).astype(int)

In [6]: df3 = df1.replace({rank: name for rank, name in enumerate(df1.columns, 1)})

In [7]: df3.columns = range(1, 1 + df3.shape[1])

In [8]: df3
Out[8]: 
   1  2  3  4  5
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  B  C  D  E  A

Yet another way.

In [6]: ranks = df.rank(axis=1).astype(int)-1
In [7]: new_values = df.columns.values.take(ranks)

In [8]: pd.DataFrame(new_values)
Out[8]: 
   0  1  2  3  4
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  B  C  D  E  A
0
votes

Use stack, reset_index, and pivot

df.rank(1).astype(int).stack().reset_index() \
  .pivot('level_0', 0, 'level_1').rename_axis(None)

enter image description here


Timing

enter image description here