5
votes

I have an excel sheet with 40 worksheets. I need to know which columns in these sheets are not present in other sheets. ex sheet number 1: column1 column2 column3 column4

sheet number 2: column1 column2 column3 column5

sheet number 3: column1 column2 column3 column 5 column6

my dataframe:

df_column_sheet_name    column
sheet number 1:           column4
sheet number 2:           column5
sheet number 3:           column5,column6

thanks a lot for the help regards

1

1 Answers

2
votes

First use sheet_name=None parameter for convert all sheets to dictionary of DataFrames:

df = pd.read_excel('file.xlsx', sheet_name=None)
#print (df)

Then get all columns which are same in all sheets by reduce:

from functools import reduce
same = reduce(lambda x, y: set(x) & set(y), df.values())
print (list(same))
['column1', 'column2', 'column3']

And last filter columns with join columns names which not match and pass to DataFrame constructor:

out = [(k, ', '.join(x for x in v.columns if x not in same)) for k, v in df.items()]
print (out)
[('sheet number 1', 'column4'), 
 ('sheet number 2', 'column5'), 
 ('sheet number 3', 'column5, column6')]


df1 = pd.DataFrame(out, columns=['df_column_sheet_name','column'])
print (df1)
  df_column_sheet_name            column
0       sheet number 1           column4
1       sheet number 2           column5
2       sheet number 3  column5, column6