4
votes

I have two data sets with latitude, longitude, and temperature data. One data set corresponds to a geographic region of interest with the corresponding lat/long pairs that form the boundary and contents of the region (Matrix Dimension = 4518x2)

The other data set contains lat/long and temperature data for a larger region that envelopes the region of interest (Matrix Dimenion = 10875x3).

My question is: How do you extract the appropriate row data (lat, long, temperature) from the 2nd data set that matches the first data set's lat/long data?

I've tried a variety of "for loops," "subset," and "unique" commands but I can't obtain the matching temperature data.

Thanks in advance!


10/31 Edit: I forgot to mention that I'm using "R" to process this data.

The lat/long data for the region of interest was provided as a list of 4,518 files containing the lat/long coordinates in the name of each file:

x<- dir()

lenx<- length(x)

g <- strsplit(x, "_")

coord1 <- matrix(NA,nrow=lenx, ncol=1)  
coord2 <- matrix(NA,nrow=lenx, ncol=1)

for(i in 1:lenx) {  
coord1[i,1] <- unlist(g)[2+3*(i-1)]  
coord2[i,1] <- unlist(g)[3+3*(i-1)]     
} 

coord1<-as.numeric(coord1)  
coord2<-as.numeric(coord2)

coord<- cbind(coord1, coord2)

The lat/long and temperature data was obtained from an NCDF file for with temperature data for 10,875 lat/long pairs:

long<- tempcd$var[["Temp"]]$size[1]   
lat<- tempcd$var[["Temp"]]$size[2]   
time<- tempcd$var[["Temp"]]$size[3]  
proj<- tempcd$var[["Temp"]]$size[4]  

temp<- matrix(NA, nrow=lat*long, ncol = time)  
lat_c<- matrix(NA, nrow=lat*long, ncol=1)  
long_c<- matrix(NA, nrow=lat*long, ncol =1)  

counter<- 1  

for(i in 1:lat){  
    for(j in 1:long){  
        temp[counter,]<-get.var.ncdf(precipcd, varid= "Prcp", count = c(1,1,time,1), start=c(j,i,1,1))  
        counter<- counter+1  
    }  
}  

temp_gcm <- cbind(lat_c, long_c, temp)`

So now the question is how do you remove values from "temp_gcm" that correspond to lat/long data pairs from "coord?"

2
A very interesting question. Does the set of lat/long for the area of interest simply bound the region, or is it the set of all lat/long pairs for which there is temperature data in that region?Nathaniel Ford
What language are we using here? And can we get a brief code sample to see what your data structures look like?slashingweapon
@Nathaniel Ford: The set of lat/long data corresponds to both the boundary of the region and the centroid for each grid within the region of interest.Noe Santos
@slashingweapon Oh right, I'm using "R" language to process this data. I will provide an example of the data structures shortly!Noe Santos
Can you separate the boundary points from the grid centroid points? You could use the boundary points to create a polygon and use one of the "point in polygon" functions (eg. package sp) to select the points that lie within the region.dcarlson

2 Answers

2
votes

Noe,

I can think of a number of ways you could do this. The simplest, albeit not the most efficient would be to make use of R's which() function, which takes a logical argument, while iterating over the data frame which you want to apply the matches to. Of course, this is assuming that there can be at most a single match in the larger data set. Based on your data sets, I would do something like this:

attach(temp_gcm)    # adds the temp_gcm column names to the global namespace
attach(coord)    # adds the coord column names to the global namespace

matched.temp = vector(length = nrow(coord)) # To store matching results
for (i in seq(coord)) {

   matched.temp[i] = temp[which(lat_c == coord1[i] & long_c == coord2[i])]
}

# Now add the results column to the coord data frame (indexes match)
coord$temperature = matched.temp

The function which(lat_c == coord1[i] & long_c == coord2[i]) returns a vector of all rows in the dataframe temp_gcm which satisfy lat_c and long_c matching coord1 and coord2 respectively from row i in the iteration (NOTE: I'm assuming this vector will only have length 1, i.e. there is only 1 possible match). matched.temp[i] will then be assigned the value from the column temp in the dataframe temp_gcm which satisfied the logical condition. Note that the goal in doing this is that we create a vector which has matched values that correspond by index to the rows of the dataframe coord.

I hope this helps. Note that this is a rudimentary approach, and I would advise looking up the function merge() as well as apply() to do this in a more succinct manner.

0
votes

I added an additional column full of zeros to use as the resultant for an IF statement. "x" is the number of rows in temp_gcm. "y" is the number of columns (representative of time steps). "temp_s" is the standardized temperature data

indicator<- matrix(0, nrow = x, ncol = 1)

precip_s<- cbind(precip_s, indicator)

temp_s<- cbind(temp_s, indicator)

for(aa in 1:x){

    current_lat<-latitudes[aa,1] #Latitudes corresponding to larger area

    current_long<- longitudes[aa,1] #Longitudes corresponding to larger area

    for(ab in 1:lenx){ #Lenx coresponds to nrow(coord)

        if(current_lat == coord[ab,1] & current_long == coord[ab,2]) {
            precip_s[aa,(y/12+1)]<-1 #y/12+1 corresponds to "indicator column"
            temp_s[aa,(y/12+1)]<-1
        } 
    }
}


precip_s<- precip_s[precip_s[,(y/12+1)]>0,] #Removes rows with "0"s remaining in "indcator" column

temp_s<- temp_s[temp_s[,(y/12+1)]>0,]


precip_s<- precip_s[,-(y/12+1)] #Removes "indicator column

temp_s<- temp_s[,-(y/12+1)]