Kruskal Wallis Test and subsetting

Question

Are you please able to assist in performing a Krustal Wallis test using a subset of my data? I would like to be able to test for differences in "N" between "Producers".

names(Isotope.Data)
[1] "Species"         "Name"            "Group"           "Simple_Group"       "Trophic_Group"  
[6] "Sample"          "N"               "C"

In my csv.file I have a column "Trophic Group" which separates Consumers and Producers.

table(Isotope.Data$Trophic_Group)

Consumer Producers  
    61         18

Under the column heading Simple_Group, I have three Producers - Rhodophyta, Seagrass and Phaeophyceae

table(Isotope.Data$Simple_Group)

 Abalone  Loliginidae      Octopus Phaeophyceae   Rhodophyta     Seagrass      Teleost 
      24            2           12            6            9            3           20 
Tunicate 
       3

I have tried numerous things, but I get various error messages. Would anyone be able to improve on the following code?

kruskal.test(C ~ Simple_Group, data = Isotope.Data, subset = Isotope.Data$Trophic_Group = "Producers")

P.S. I have created a separate CSV.file which only includes Primary Producers. However a subsequent Dunn-test of multiple comparisons, used to determine which levels differed from each other provides different significance levels to those which includes both Consumers and Producers.

I have several questions: What is C when you call kruskal.test? Which is the error message you get when running the code? — R18
C refers to Carbon, and N refers to Nitrogen. I will run separate tests to test for differences in C and N between consumers and producers — Greeny
The error is: Error: unexpected '=' in "kruskal.test(C ~ Simple_Group, data = Isotope.Data, subset = Isotope.Data$Trophic_Group =" — Greeny
Thanks Roman, I have tried that also. I get the following error.... Error in kruskal.test.default(numeric(0), integer(0)) : all observations are in the same group — Greeny

maycca maycca · Accepted Answer · 2018-05-04T21:01:37

Will maybe this answer be helpful? Based on @user295691 answer:

Kruskal-Wallis test: create lapply function to subset data.frame?

Here you identify individual groups what you want to test differences between, and use split, to correctly define subsetting of your data frame.

Dummy example:

# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)

df<-data.frame(val, distance, phase)

# get unique groups
ii<-unique(df$phase)

# run Kruskal test, specify the subset
kruskal.test(df$val ~df$distance,
             subset = phase == "c")

And now apply the kruskal.test to each group using split:

lapply(split(df, df$phase), function(d) { kruskal.test(val ~ distance, data=d) })

or create a function:

lapply(ii, function(i) { kruskal.test(df$val ~ df$distance, subset=df$phase==i )})

Both produces test results for each group:

[[1]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.14881, df = 1, p-value = 0.6997


[[2]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.11688, df = 1, p-value = 0.7324


[[3]]

    Kruskal-Wallis rank sum test

data:  df$val by df$distance
Kruskal-Wallis chi-squared = 0.0059524, df = 1, p-value = 0.9385

Or just get the p-values (notice the addition of $p.value after the kruskal.test):

lapply(ii, function(i) { 
  kruskal.test(df$val ~ df$distance, 
               subset=df$phase==i )$p.value
}
  )

Kruskal Wallis Test and subsetting

1 Answers