0
votes

I have a dataset with the following structure:

Features Method Distance V1 V2 ........  V100
  V1V2     LOF     A      4  5  .........  6
   .
   .
   .
V1V2V3V4V5 Gaussian C     7  8   .........  7

The dataset is composed by 624 rows and 103 columns. The three first columns correspond with the information of each row and the rest of columns from V1 to V100 with the data.

I need to create a multiplot 26x8 barplots showing the mean value and the standard error of the mean. I add a function to calculate the standard error of the mean.

#function for standard error of the mean
sem <- function(x){
 sd(x)/sqrt(length(x))
 }

Each barplot should show the mean value from V1 to V100 and the standard error of the mean for each Distance A, B, C.

An example of dataset is available below

df <- read.table(text=" Features      Method Distance        V1        V2        V3        V4        V5        V6        V7
   V1V2         LOF        A 11.764706  3.703704 15.384615  9.090909  9.090909  8.000000  7.407407
V1V2 Mahalanobis        A 11.764706 33.333333 15.384615  9.090909  9.090909 28.571429 33.333333
  V1V2        Cook        A 40.540541  6.666667 24.390244 24.358974 32.608696 15.584416 17.647059
  V1V2      DIFFTS        A 24.590164  4.958678 28.169014 26.950355 30.588235 47.058824 10.909091
  V1V2       OCSVM        A 36.585366 25.000000 57.142857 35.514019 88.372093  8.988764  5.825243
  V1V2      DBSCAN        A 44.117647 21.428571 30.769231 51.351351 41.269841 14.814815  6.976744
  V1V2         PCA        A 11.764706 33.333333 15.384615  9.090909  9.090909 28.571429 33.333333
  V1V2    Gaussian        A  1.886792  3.278689  1.869159  1.398601  2.597403  2.197802  4.878049
  V1V3         LOF        A 12.698413 20.000000 55.000000  6.666667 33.333333 29.787234  2.777778
 V1V3 Mahalanobis        A 11.764706 33.333333 15.384615  9.090909  9.090909 28.571429 33.333333",header=T)

An example of plot should be like this but with the mean and the standard error of the mean.

enter image description here

Raúl

1
Please edit your post by removing links (they break, making the references inaccessible for future searches) and make it reproducible in itself.Heroka
Edited @Heroka. I'll improve my posts. Could you help me with this one? You've solved the last post very similarRaúl Parada Medina
Should the mean and SE be added to the plot, or should the plot only contain mean and SE?Heroka
The plot should show the bar with the mean and overlapped the SE. Only Mean and SERaúl Parada Medina

1 Answers

0
votes

Here you go. I chose to aggregate data before plotting, as I prefer to be in control of things like that. You could use the built-in stat_summary in ggplot2 as well.

library(ggplot2)
library(dplyr)
library(reshape2)

#first, reshape (just like in your previous Q)

df_m <- melt(df,id.vars=c("Features","Method","Distance"))

#now aggregate
sem <- function(x){
  sd(x)/sqrt(length(x))
}

df_a <- df_m %>% group_by(Features,Method,Distance) %>% summarise(
  mean_value=mean(value),
  sem_value=sem(value)
)

#now plotting is easy
#using bars
p1 <- ggplot(df_a, aes(x=Distance))+
  facet_grid(Features~Method)+
  geom_bar(aes(y=mean_value),stat="identity")+
  geom_errorbar(aes(ymin=mean_value-sem_value,ymax=mean_value+sem_value))
p1

enter image description here #using point (my preference)

p2 <- ggplot(df_a, aes(x=Distance))+
  facet_grid(Features~Method)+
  geom_point(aes(y=mean_value),size=2)+
  geom_errorbar(aes(ymin=mean_value-sem_value,ymax=mean_value+sem_value))
p2

enter image description here