Alternative to for loop R

Question

I have written a function that will compare the similarity of IP addresses, and will let the user select the level of detail in the octet. for example, in the address 255.255.255.0 and 255.255.255.1, a user could specify that they only want to compare the first, first and second, first second third etc. octets.

the function is below:

did.change.ip=function(vec, detail){
  counter=2
  result.vec=FALSE
  r.list=strsplit(vec, '.', fixed=TRUE)

  for(i in vec){
    if(counter>length(vec)){
      break
    }

    first=as.numeric(r.list[[counter-1]][1:detail])
    second=as.numeric(r.list[[counter]][1:detail])

    if(sum(first==second)==detail){
      result.vec=append(result.vec,FALSE)
    }
    else{
      result.vec=append(result.vec,TRUE)
    }
    counter=counter+1
  }
  return(result.vec)
}

and it's really slow once the data starts getting larger. for a dataset of 500,000 rows, the system.time() results are:

user  system elapsed 
 208.36    0.59  209.84

are there any R power users who have insight on how to write this more efficiently? I know lapply() is the preferred method for looping over vectors/dataframes, but I'm stumped as to how to access the previous element in a vector for this purpose. I've tried to sketch something out quickly, but It returns a syntax error:

test=function(vec, detail){
  rlist=strsplit(vec, '.', fixed=TRUE)
  r.value=vapply(rlist, function(x,detail) ifelse(x[1:detail]==x[1:detail] TRUE, FALSE))
}

I've created some sample data for testing purposes below:

stack.data=structure(list(V1 = c("247.116.209.66", "195.121.47.105", "182.136.49.12", 
"237.123.100.50", "120.30.174.18", "29.85.72.70", "18.186.76.177", 
"33.248.142.26", "109.97.92.50", "217.138.155.145", "20.203.156.2", 
"71.1.51.190", "31.225.208.60", "55.25.129.73", "211.204.249.244", 
"198.137.15.53", "234.106.102.196", "244.3.87.9", "205.242.10.22", 
"243.61.212.19", "32.165.79.86", "190.207.159.147", "157.153.136.100", 
"36.151.152.15", "2.254.210.246", "3.42.1.208", "30.11.229.18", 
"72.187.36.103", "98.114.189.34", "67.93.180.224")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-30L))

MrFlick MrFlick · Accepted Answer · 2014-06-30T21:53:44

Here's another solution just using base R.

did.change.ip <- function(vec, detail=4){
    ipv <- scan(text=paste(vec, collapse="\n"), 
        what=c(replicate(detail, integer()), replicate(4-detail,NULL)), 
        sep=".", quiet=TRUE)
    c(FALSE, rowSums(vapply(ipv[!sapply(ipv, is.null)], 
        diff, integer(length(vec)-1))!=0)>0)
}

Here we use scan() to break up the ip address into numbers. Then we we look down each octet for differences using diff. It seems this is faster than the original proposal, but slightly slower than @josilber's stringr solution (using microbenchmark with 3,000 ip addresses)

Unit: milliseconds
   expr       min        lq    median        uq       max neval
   orig 35.251886 35.716921 36.019354 36.700550 90.159992   100
   scan  2.062189  2.116391  2.170110  2.236658  3.563771   100
 strngr  2.027232  2.075018  2.136114  2.200096  3.535227   100

Alternative to for loop R

3 Answers