2
votes

I'm trying to find sites to collect snails by using a semi-random selection method. I have set a 10km2 grid around the region I want to collect snails from, which is broken into 10,000 10m2 cells. I want to randomly this grid in R to select 200 field sites.

Randomly sampling a matrix in R is easy enough;

dat <- matrix(1:10000, nrow = 100)

sample(dat, size = 200)

However, I want to bias the sampling to pick cells closer to a single position (representing sites closer to the research station). It's easier to explain this with an image;

enter image description here

The yellow cell with a cross represents the position I want to sample around. The grey shading is the probability of picking a cell in the sample function, with darker cells being more likely to be sampled.

I know I can specify sampling probabilities using the prob argument in sample, but I don't know how to create a 2D probability matrix. Any help would be appreciated, I don't want to do this by hand.

2

2 Answers

2
votes

I'm going to do this for a 9 x 6 grid (54 cells), just so it's easier to see what's going on, and sample only 5 of these 54 cells. You can modify this to a 100 x 100 grid where you sample 200 from 10,000 cells.

# Number of rows and columns of the grid (modify these as required)
nx <- 9 # rows
ny <- 6 # columns

# Create coordinate matrix
x <- rep(1:nx, each=ny);x
y <- rep(1:ny, nx);y 
xy <- cbind(x, y); xy

# Where is the station? (edit: not snails nest)
Station <- rbind(c(x=3, y=2)) # Change as required

# Determine distance from each grid location to the station
library(SpatialTools)
D <- dist2(xy, Station)

From the help page of dist2

dist2 takes the matrices of coordinates coords1 and coords2 and returns the inter-Euclidean distances between coordinates.

We can visualize this using the image function.

XY <- (matrix(D, nr=nx, byrow=TRUE))
image(XY) # axes are scaled to 0-1

# Create a scaling function - scales x to lie in [0-1)
scale_prop <- function(x, m=0)
  (x - min(x)) / (m + max(x) - min(x))

# Add the coordinates to the grid
text(x=scale_prop(xy[,1]), y=scale_prop(xy[,2]), labels=paste(xy[,1],xy[,2],sep=","))

enter image description here

Lighter tones indicate grids closer to the station at (3,2).

# Sampling probabilities will be proportional to the distance from the station, which are scaled to lie between [0 - 1). We don't want a 1 for the maximum distance (m=1).
prob <- 1 - scale_prop(D, m=1); range (prob)

# Sample from the grid using given probabilities
sam <- sample(1:nrow(xy), size = 5, prob=prob) # Change size as required.
xy[sam,] # Thse are your (**MY!**) 5 samples
     x y
[1,] 4 4
[2,] 7 1
[3,] 3 2
[4,] 5 1
[5,] 5 3

To confirm the sample probabilities are correct, you can simulate many samples and see which coordinates were sampled the most.

snail.sam <- function(nsamples) {
  sam <- sample(1:nrow(xy), size = nsamples, prob=prob)
  apply(xy[sam,], 1, function(x) paste(x[1], x[2], sep=","))
}

SAMPLES <- replicate(10000, snail.sam(5))

tab <- table(SAMPLES)
cols <- colorRampPalette(c("lightblue", "darkblue"))(max(tab))
barplot(table(SAMPLES), horiz=TRUE, las=1, cex.names=0.5,
        col=cols[tab])

enter image description here


If using a 100 x 100 grid and the station is located at coordinates (60,70), then the image would look like this, with the sampled grids shown as black dots:

enter image description here

There is a tendency for the points to be located close to the station, although the sampling variability may make this difficult to see. If you want to give even more weight to grids near the station, then you can rescale the probabilities, which I think is ok to do, to save costs on travelling, but these weights need to be incorporated into the analysis when estimating the number of snails in the whole region. Here I've cubed the probabilities just so you can see what happens.

sam <- sample(1:nrow(xy), size = 200, prob=prob^3)

enter image description here

The tendency for the points to be located near the station is now more obvious.

0
votes

There may be a better way than this but a quick way to do it is to randomly sample on both x and y axis using a distribution (I used the normal - bell shaped distribution, but you can really use any). The trick is to make the mean of the distribution the position of the research station. You can change the bias towards the research station by changing the standard deviation of the distribution. Then use the randomly selected positions as your x and y coordinates to select the positions.

dat <- matrix(1:10000, nrow = 100)
#randomly selected a position for the research station
rs <- c(80,30) 
# you can change the sd to change the bias
x <- round(rnorm(400,mean = rs[1], sd = 10)) 
y <- round(rnorm(400, mean = rs[2], sd = 10))  
position <- rep(NA, 200)
j = 1
i = 1

# as some of the numbers sampled can be outside of the area you want I oversampled # and then only selected the first 200 that were in the area of interest. 
while (j <= 200) {
  if(x[i] > 0 & x[i] < 100 &  y[i] > 0 & y [i]< 100){
    position[j] <- dat[x[i],y[i]]
    j = j +1
  }
  i = i +1
}

plot the results:

plot(x,y, pch = 19)
points(x =80,y = 30, col = "red", pch = 19) # position of the station

enter image description here