0
votes

I have a data frame I read from an Excel file with merged cells. The data frame looks like this:

Data frame content

wish to reshape the data frame from long to wide to be like this: Reshape required structure

I am using pivot as the following code:

df1 = df.pivot_table(
    values='Answers',
    index=['Date', 'ID'],
    columns='Questions'
    )

# Formatting.
df1.reset_index(inplace=True)
df1.columns.name = None

But the data frame after the pivot shows only the first column as the following image: Data frame after pivot

Any idea why?

1

1 Answers

0
votes

The problem is that your index is no correctly understood by pandas (all the NaNs).

To solve that, you need to first propagate the values in your Date column to fill the NaN, as already answered here: Pandas: Reading Excel with merged cells.

>>> df = pd.DataFrame({"Date": ["2020-01-01", np.nan, np.nan, "2020-01-02", np.nan, np.nan], "Question": ["Q1", "Q2", "Q3", "Q1", "Q2", "Q3"], "Value": [1, 2, 3, 4, 5, 6]})
>>> df
         Date Question  Values
0  2020-01-01       Q1       1
1         NaN       Q2       2
2         NaN       Q3       3
3  2020-01-02       Q1       4
4         NaN       Q2       5
5         NaN       Q3       6
>>> df["Date"].fillna(method="ffill", inplace=True)  # Fill Date column
>>> df
         Date Question  Values
0  2020-01-01       Q1       1
1  2020-01-01       Q2       2
2  2020-01-01       Q3       3
3  2020-01-02       Q1       4
4  2020-01-02       Q2       5
5  2020-01-02       Q3       6
>>> # Now you can pivot normally
>>> df1 = df.pivot_table(index="Date", values="Value", columns="Question")
>>> df1.reset_index(inplace=True)
>>> df1.columns.name = None
>>> df1
         Date  Q1  Q2  Q3
0  2020-01-01   1   2   3
1  2020-01-02   4   5   6

Hope it helps ;)