0
votes

I have a 64-bit Windows 7 machine with 8GB RAM. memory.limit() shows 8135. I ran into memory issue even though what I'm trying to do does not seem humongous at all (compared to other memory-related questions on SO).

Basically I'm matching firm's ID with their industry. ref.table is the data frame where I store ID and industry for reference.

matchid <- function(id) {
  firm.industry <- ref.table$industry[ref.table$id==id]
  firm.industry <- as.character(firm.industry[1]) # Sometimes same ID has multiple industries. I just pick one.
  resid <<- c(resid, firm.industry)
}
resid <- c()
invisible( lapply(unmatched.id, matchid) ) # unmatched.id is the vector of firms' ID to be matched

The unmatched.id vector is about 60,000-element long. Still I got the error "Cannot allocate vector of 41.8kb size" (Only 41.8kb!) Windows task manager shows full RAM usage at all time.

Is it because my function is too unwieldy somehow? I can't imagine it's the vector size causing problems.

(PS: I do gc() and rm() frequently)

2
Just because you do gc and rm does not ensure that you will get your memory "compacted". You need contiguous memory blocks and it's perfectly possible that your prior activities have fragmented your memory. Furthermore, you have system and other applications competeing for memory. Shutdown. Restart with no other applications. Do only the coding needed to create this object and you will have not difficulties. (And this is surely a duplicate question, so next time do a search before posting.)IRTFM
And do note that the value is an extra 41.8kb block. Whatever you are doing is exhausting all the RAM and R cannot get another chunk of the required size to complete the operation.Gavin Simpson
Unwieldy? Yes. When you grow objects as in resid <<- c(resid, firm.industry), R has to copy the entire vector every time you append a new element. And the use of <<- is generally frowned upon. I'm just guessing but you could probably do all this just with unique(ref.table$industry[ref.table$id %in% unmatched.id]).joran
@DWin I know that this continuous block issue has been asked before, but I'm not sure what part of my code is causing it. I will follow joran advice and fix that part of my code.Heisenberg
look closely, you have exponential growth in residRicardo Saporta

2 Answers

3
votes

Try the following to see if it quits giving you memory complaints

 lapply(unmatched.id, function(id) as.character(ref.table$industry[ref.table$id==id]))

If the above works, then wrap it in unlist( .., use.names=FALSE)

or try using data.table

library(data.table)
ref.table <- data.table(ref.table, key="id") 
ref.table[.(unmatched.id), as.character(industry)]
2
votes

I think you're looking up a vector of unmatched id's in ref.table$id, and finding the corresponding index

## first match, one for each unmatched.id, NA if no match
idx <- match(unmatched.id, ref.table$id)
## matching industries
resid <- ref.table$industry[idx]

This is 'vectorized' so much more efficient than an lapply.