Repeated Measures ANOVA in Pandas, dependent variable values in different Columns

Question

I am quiet new to Data-Science so maybe this will be quiet easy for more advanced coders. I want to do a repeated measures ANOVA based on pre & post measurements of a test in different groups (Experimental Group vs. Control Group). Every subject only participated in one group.

In my Pandas - df I have the following columns: "Subject ID" (unique), "Condition" (Experimental or Control), "Pre-Measure Value", "Post-Measure Value" ...

     subject_id = [1,2,3,4]
     condition = [1,2,1,2]
     pre = [1.1,2.1,3.1,4.1]
     post = [1.2, 2.2, 3.2, 4.2]
     sample_df = pd.DataFrame({"Subject ID": subject_id, "Condition": condition, "Pre": pre, "Post": post})
     sample_df

How can I analyze this using ANOVA? The packages I've seen use dataframes where the dep variable is in one column whereas in my dataframe the depending measures which I want to evaluate are in two columns. Would I need to add another column specifying whether the value is pre or post for every value and condition. Is there a "handy" function to do something like this?

Specifically the output would need to look like:

subject_id_new = [1,1,2,2,3,3,4,4]
condition_new = [1,1,2,2,1,1,2,2]
measurement = ["pre", "post","pre", "post","pre", "post","pre", "post"]
value = [1.1, 1.2,2.1,2.2,3.1,3.2,4.1,4.2] 
new_df = pd.DataFrame({"Subject ID":subject_id_new, "Condition": condition_new, "Measurement": measurement, "Value": value})

Thanks a lot.

mno-93 mno-93 · Accepted Answer · 2021-02-01T13:44:23

Actually, what I looked for is:

sample_df.melt(id_vars=['Subject ID', "Condition"])

This results in the dataframe with a column specifying which measurement point the value is referring to.

Repeated Measures ANOVA in Pandas, dependent variable values in different Columns

1 Answers