Using the following code;
c <- NULL
for (a in 1:4){
b <- seq(from = a, to = a + 5)
c <- rbind(c,b)
}
c <- rbind(c,c); rm(a,b)
Results in this matrix,
> c
[,1] [,2] [,3] [,4] [,5] [,6]
b 1 2 3 4 5 6
b 2 3 4 5 6 7
b 3 4 5 6 7 8
b 4 5 6 7 8 9
b 1 2 3 4 5 6
b 2 3 4 5 6 7
b 3 4 5 6 7 8
b 4 5 6 7 8 9
How can I return row indices for rows matching a specific input?
For example, with a search term of,
z <- c(3,4,5,6,7,8)
I need the following returned,
[1] 3 7
This will be used in a fairly large data frame of test data, related to a time step column, to reduce the data by accumulating time steps for matching rows.
Question answered well by others. Due to my dataset size (9.5M rows), I came up with an efficient approach that took a couple steps.
1) Sort the big data frame 'dc' containing time steps to accumulate in column 1.
dc <- dc[order(dc[,2],dc[,3],dc[,4],dc[,5],dc[,6],dc[,7],dc[,8]),]
2) Create a new data frame with unique entries (excluding column 1).
dcU <- unique(dc[,2:8])
3) Write Rcpp (C++) function to loop through unique data frame which iterates through the original data frame accumulating time while rows are equal and indexes to the next for loop step when an unequal row is identified.
require(Rcpp)
getTsrc <-
'
NumericVector getT(NumericMatrix dc, NumericMatrix dcU)
{
int k = 0;
int n = dcU.nrow();
NumericVector tU(n);
for (int i = 0; i<n; i++)
{
while ((dcU(i,0)==dc(k,1))&&(dcU(i,1)==dc(k,2))&&(dcU(i,2)==dc(k,3))&&
(dcU(i,3)==dc(k,4))&&(dcU(i,4)==dc(k,5))&&(dcU(i,5)==dc(k,6))&&
(dcU(i,6)==dc(k,7)))
{
tU[i] = tU[i] + dc(k,0);
k++;
}
}
return(tU);
}
'
cppFunction(getTsrc)
4) Convert function inputs to matrices.
dc1 <- as.matrix(dc)
dcU1 <- as.matrix(dcU)
5) Run the function and time it (returns time vector matching unique data frame)
pt <- proc.time()
t <- getT(dc1, dcU1)
print(proc.time() - pt)
user system elapsed
0.18 0.03 0.20
6) Self high-five and more coffee.