Monte Carlo Simulation using R: Problem with sorting and significance

Question

I am trying to implement the following statistical test using Monte-Carlo simulation. This method is based on the following paper: https://journals.ametsoc.org/doi/full/10.1175/JCLI4217.1

Details:

The above paper calculates the significance of the difference in means between two periods :1961-1983 and 1984-2000 of tropical cyclone passage frequency (not-normally distributed) using Monte Carlo simulation.

This should be a two-tailed test.

The following steps were provided:

1). First, 9999 randomly sorted 40-yr time series of the typhoon passage frequency are prepared.

2). Averages of the former 23-yr values (1961-1983) minus those of the latter 17-yr values are calculated.

3). From the rank of the original difference value among 10000 samples, the significance level is estimated.

Here's what I have so far

Suppose I have the following data set. The columns indicate the counts per year, while the rows are for lat-lon coordinates (numbers here for simplicity).

A<-matrix(floor(runif(100,min=0,max=20)),nrow=5,ncol=40)
colnames(A)<-c("X1961","X1962","X1963","X1964","X1965","X1966","X1967","X1968","X1969","X1970","X1971","X1972","X1973","X1974","X1975","X1976","X1977","X1978","X1979","X1980","X1981","X1982","X1983","X1984","X1985","X1986","X1987","X1988","X1989","X1990","X1991","X1992","X1993","X1994","X1995","X1996","X1997","X1998","X1999","X2000")

set.seed(1)
rand <- sample(nrow(A),9999,replace=TRUE)
A[rand,]

Problem (Updated)

I am confused about how to do this correctly in R. I should be performing the monte-carlo test per row. So doing this in one row:

A[rand[1],]

X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971 X1972 
X1973 
5    14    11    17    16    17    11     2     8     3    13    10     
1 
X1974 X1975 X1976 X1977 X1978 X1979 X1980 X1981 X1982 X1983 X1984 X1985 
X1986 
10    15     5     3     6    15    19     5    14    11    17    16    
17 
X1987 X1988 X1989 X1990 X1991 X1992 X1993 X1994 X1995 X1996 X1997 X1998 
X1999 
11     2     8     3    13    10     1    10    15     5     3     6    
15 
X2000 
19

original:

A[1,]

X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971 X1972 
X1973 
18     1     6     7     3    12    19     0    17    17     0    10    
16 
X1974 X1975 X1976 X1977 X1978 X1979 X1980 X1981 X1982 X1983 X1984 X1985 
X1986 
3     4     0    15     8    17     1    18     1     6     7     3    
12 
X1987 X1988 X1989 X1990 X1991 X1992 X1993 X1994 X1995 X1996 X1997 X1998 
X1999 
19     0    17    17     0    10    16     3     4     0    15     8    
17 
X2000 
1

Expected Output*

I want to add a pvalue column in the original matrix for this test. The significance test should be done per row. Of course, this can be achieved by using an apply() function.

Problems

How can I implement the third condition? Also, does the order matter for step 1 in a monte-carlo test?

I feel that I am misinterpreting the step 1, should I use replicate() for this? Something like this?

rand<-replicate(40,sample(nrow(A),9999,replace=T))

Any suggestions on how to do this correctly?

I'll appreciate any help regarding this.

"How can I implement the third condition?" What is the algorithm? — Roland
@Roland..I am also not sure regarding the 3rd condition. It just mentioned, "from the rank of the original difference value". I apologize for the lack of understanding regarding this test. It is also my first time doing a monte-carlo test. — Lyndz
I've already tried to guide you: Ask a specific question on Cross Validated to understand this sentence. Then come back here if you need help with R. — Roland

Santiago I. Hurtado Santiago I. Hurtado · Accepted Answer · 2019-05-24T18:10:04

This code should solve your problem. If you have to do it for lot of data it is easily parallelized with the package 'foreach' and 'doParallel'. This function take your data and make nrep samples for both tiles of the data, and then take the difference of the means. With that calculate the FDP of the difference of the means, and then look the percentile of the data difference of mean to get the p value.

  my.fun <- function(x,nrep = 1000,breakpoint){
    # x is the data
    # nrep is the amount of simulations
    # breakpoint is where the breakpoint is
    set.seed(12345)
    a_sim <- vector(mode = 'double', length = nrep)
    n <- length(x)
    for(i in 1:nrep){
      aux1 <- sample(x,size = breakpoint,replace = T)
      aux2 <- sample(x,size = n-breakpoint,replace = T)
      a_sim[i] <- abs(mean(aux1) - mean(aux2))
    }
    cum_dist_func <- ecdf(a_sim)
    p <- 1-cum_dist_func(abs(mean(x[1:breakpoint])-mean(x[(breakpoint+1):n])))

    return(p)
  }


  pvalue <- apply(X = A,MARGIN = 1,FUN = my.fun,breakpoint = 23 )
  A <- cbind(A,pvalue)

Monte Carlo Simulation using R: Problem with sorting and significance

1 Answers