2
votes

I am looking to find the unique values for each column in my dataframe. (Values unique for the whole dataframe)

        Col1         Col2            Col3
1        A             A               B
2        C             A               B
3        B             B               F

Col1 has C as a unique value, Col2 has none and Col3 has F.

Any genius ideas ? thank you !

2
Do you prioritize efficiency or code elegance? How big is your DataFrame?ntg

2 Answers

4
votes

You can use stack for Series, then drop_duplicates - keep=False remove all, remove first level by reset_index and last reindex:

df = df.stack()
       .drop_duplicates(keep=False)
       .reset_index(level=0, drop=True)
       .reindex(index=df.columns)
print (df)

Col1      C
Col2    NaN
Col3      F
dtype: object

Solution above works nice if only one unique value per column.

I try create more general solution:

print (df)
  Col1 Col2 Col3
1    A    A    B
2    C    A    X
3    B    B    F

s = df.stack().drop_duplicates(keep=False).reset_index(level=0, drop=True)
print (s)
Col1    C
Col3    X
Col3    F
dtype: object

s = s.groupby(level=0).unique().reindex(index=df.columns)
print (s)
Col1       [C]
Col2       NaN
Col3    [X, F]
dtype: object
0
votes

I don't believe this is exactly what you want, but as useful information - you can find unique values for a DataFrame using numpy's .unique() like so:

>>> np.unique(df[['Col1', 'Col2', 'Col3']])
['A' 'B' 'C' 'F']

You can also get unique values of a specific column, e.g. Col3:

>>> df.Col3.unique()
['B' 'F']