I have a program where I am running a simulation function for a large number of iterations. I'm stuck, however, on what I expected to be the easiest part: figuring out how to store frequency counts of the function's results.
The simulation function itself is complicated, but is analogous to the R's sample()
function. A large amount of data goes in, and the function outputs a vector containing a subset of elements.
x <- c("red", "blue", "yellow", "orange", "green", "black", "white", "pink")
run_simulation <- function(input_data, iterations = 100){
for (i in 1:iterations){
result <- sample(input_data, 3, replace=FALSE)
results <- ????
}
}
run_simulation(x)
My question is what is the best (most efficient and R-like) data structure to store the frequency counts of the results of the function inside the simulation loop. As you might be able to tell from the for
loop, my background is in languages like Python, where I would create a dict keyed by tuples that increments every time a particular combination is output:
counts[results_tuple] = counts.get(results_tuple, 0) + 1
However, there is no equivalent dict/hashmap type structure in R, and I've often found that trying to emulate other languages in R is a recipe for ugly and inefficient code. (Right now I am converting the output vector to a string and appending it to a result list that I count later with table()
, but that is very memory inefficient for a high number of iterations over a function that has a limited number of possible output vectors.)
To be clear, here is kind of output I want:
Result Freq
black, pink, green 8
blue, red, white 7
black, pink, blue 7
blue, green, black 5
blue, green, red 4
green, blue, white 3
pink, green, white 3
white, blue, green 1
white, orange, red 1
yellow, black, orange 1
yellow, blue, green 1
I don't care about the frequency of any particular element, only the set. And I don't care about the order of output, just the frequency.
Any advice is appreciated!