2
votes

I'm struggling with chisq.test in Tableau via R. I have a model in Excel which I have to replicate, but my results differ. I think the problem is in correct massaging of the data in R code...

These are the Excel values:

excel values

p-value formula in Excel is: CHISQ.TEST(F4:G4,H4:I4).

This is how Tableau looks:

tableau values

p-value formula in Tableau is:

SCRIPT_REAL("
obs_yes = .arg1;
obs_no = .arg2;
exp_yes = .arg3;
exp_no = .arg4;

df1 <- data.frame(obs_yes,obs_no);
df2 <- data.frame(exp_yes,exp_no);

print('df1 is ');
print(df1);
print('df2 is ');
print(df2);

if ( !is.na(obs_yes) && !is.na(obs_no) && !is.na(exp_yes) && !is.na(exp_no)) {
    #ChiSq = chisq.test(c(df1[8,1],df1[8,2]), p=c(df2[8,1],df2[8,2]), rescale.p=TRUE);
    ChiSq = chisq.test(df1, p=df2, rescale.p=TRUE);
    ChiSq$p.value;
} else {
    {NaN};
}
"
,[obs_yes],[obs_no],[exp_yes],[exp_no])

What I see in R debug for df1 and df2 looks OK to me:

R debug

but the p-value is not correct - looks like it is calculated over all values?

If I pass in R script only one row, then the p-value is correct:

ChiSq = chisq.test(c(df1[1,1],df1[1,2]), p=c(df2[2,1],df2[2,2]), rescale.p=TRUE);

one value

but this doesn't help, as it is a static calculation.

Any ideas?

Here is a link to the .twbx file.

Thank you,

Adrian

P.S. A working code found on a forum (for a different setup) is below, but I don't know how to adapt it for my situation.

SCRIPT_REAL(
'mm <- data.frame(commodity = .arg1);
d <- split(mm,rep(1:.arg3[1],each=.arg2[1]/.arg3[1]));
zt <-do.call?do.c(cbind, d);chisq.test(zt)$p.value'
,SUM([Number of Records]), SIZE(),[Priorities])
1
one reason your chi square values might not match is because R adds a correct to the p-value, but more related to your question why don't you use Tableau to calculate the p-value : community.tableau.com/thread/117619Mike
Mike, my value matches if I calculate it only for one row inside the R script. The problem is how to make the R script to return the p-value for each row.adim

1 Answers

1
votes

I solved the problem with inspiration from this thread.

The issue was that the R calculation should "spit" a vector of values instead of a single value.

Here is the solution, just in case anyone needed it:

SCRIPT_REAL("
obs_yes = .arg1;
obs_no = .arg2;
exp_yes = .arg3;
exp_no = .arg4;

if ( !is.na(obs_yes) && !is.na(obs_no) && !is.na(exp_yes) && !is.na(exp_no)) {
    out <- rep(NA,length(obs_yes));
    for(i in 1:length(obs_yes)) {
        out[i] <- chisq.test(c(obs_yes[i],obs_no[i]), p=c(exp_yes[i],exp_no[i]), rescale.p=TRUE)$p.value;
    }
    out
} else {
    {NaN};
}
"
,[obs_yes],[obs_no],[exp_yes],[exp_no])