I want to apply a function to a DataFrame that returns several columns for each column in the original dataset. The apply function returns a DataFrame with columns and indexes but it still raises the error ValueError: If using all scalar values, you must pass an index.
I've tried to set the name of the output dataframe, to set the columns as a multiindex and set the index as a multiindex but it doesn't work.
Example: I have this input dataframe
df_all_users = pd.DataFrame(
[[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
],
index=["2020-01-01", "2020-01-02", "2020-01-03"],
columns=["user_1", "user_2", "user_3"])
user_1 user_2 user_3
2020-01-01 1 2 3
2020-01-02 1 2 3
2020-01-03 1 2 3
The apply_function is like this:
def apply_function(df):
df_out = pd.DataFrame(index=df.index)
# these columns are in reality computed used some other functions
df_out["column_1"] = df.values # example: pyod.ocsvm.OCSVM.fit_predict(df.values)
df_out["column_2"] = - df.values # example: pyod.knn.KNN.fit_predict(df.values)
# these are the things I've tried without working
df_out.name = df.name
df_out.columns = pd.MultiIndex.from_tuples([(df.name, column) for column in df_out.columns],
names=["user", "score"])
df_out.index = pd.MultiIndex.from_tuples([(df.name, idx) for idx in df_out.index],
names=["user", "date"])
print(df_out)
return df_out
df_all_users.apply(apply_function, axis=0, result_type="expand")
Which raises the error:
ValueError: If using all scalar values, you must pass an index
The output that I expect would be like this:
out_df = pd.DataFrame(
[[1, 1, 2, 2, 3, 3],
[1, 1, 2, 2, 3, 3],
[1, 1, 2, 2, 3, 3],
],
index=["2020-01-01", "2020-01-02", "2020-01-03"],
columns=pd.MultiIndex.from_tuples([(user, column)
for user in ["user_1", "user_2", "user_3"]
for column in ["column_1", "column_2"]],
names=("user", "score"))
)
user_1 user_2 user_3
column_1 column_2 column_1 column_2 column_1 column_2
2020-01-01 1 1 2 2 3 3
2020-01-02 1 1 2 2 3 3
2020-01-03 1 1 2 2 3 3