1
votes
def anova_analysis():
  datafile = "test3.csv"

  data = pd.read_csv(datafile, header=0)
  print(data)
  moore_lm = ols('Y ~ C(A, Sum)*C(B, Sum)',
               data=data).fit()

  table = sm.stats.anova_lm(moore_lm, typ=2)  # Type 2 ANOVA DataFrame
  print(table)
  return table

Y   A   B   AB
28  -1  -1  1
36  1   -1  -1
18  -1  1   -1
31  1   1   1
25  -1  -1  1
32  1   -1  -1
19  -1  1   -1
30  1   1   1
27  -1  -1  1
32  1   -1  -1
23  -1  1   -1
29  1   1   1

Why will this data work with >4 rows data only? If I design a 2 factor full factorial table and do only 1 replication it will look like this

Y   A   B   AB
28  -1  -1  1
36  1   -1  -1
18  -1  1   -1
31  1   1   1

but statsmodel fails with

File "/home/dsb_mac/anaconda2/envs/bayes/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1033, in asarray_chkfinite "array must not contain infs or NaNs") ValueError: array must not contain infs or NaNs

1

1 Answers

1
votes

You get perfect fitin the case with 4 observations. With interaction effect you have 4 parameters so you can perfectly fit 4 cells. With perfect fit the residual variance is zero and the nan or inf come most likely from a ZeroDivision.

Type 2 ANOVA with only main effects should work, but there is most likely no code path to handle the corner case in the perfect fit model.

To get the main effects ANOVA you could use the formula without interaction effect

'Y ~ C(A, Sum) + C(B, Sum)'