3
votes

I have a dataframe that includes the lower and upper bound of a few parameters for each category of fruit. It looks sth like this:

+----------+-----------+-------+-------+
| Category | Parameter | Upper | Lower |
+----------+-----------+-------+-------+
| Apple    | alpha     | 10    | 20    |
+----------+-----------+-------+-------+
| Apple    | beta      | 20    | 30    |
+----------+-----------+-------+-------+
| Orange   | alpha     | 10    | 20    |
+----------+-----------+-------+-------+
| Orange   | beta      | 30    | 40    |
+----------+-----------+-------+-------+
| Orange   | gamma     | 50    | 60    |
+----------+-----------+-------+-------+
| Pear     | alpha     | 10    | 30    |
+----------+-----------+-------+-------+
| Pear     | beta      | 20    | 40    |
+----------+-----------+-------+-------+
| Pear     | gamma     | 20    | 30    |
+----------+-----------+-------+-------+
| Banana   | alpha     | 40    | 50    |
+----------+-----------+-------+-------+
| Banana   | beta      | 20    | 40    |
+----------+-----------+-------+-------+

I would like to write a function that:

  • Input is 1 fruit name function("Apple")
  • Extracts the upper & lower values of all the parameters of this fruit
  • Feed the upper and lower bound for alpha, beta, and gamma (if applicable) of selected fruit into the following process to make one dataframe:
param_grid_[fruit_name] <- expand.grid(alpha = seq(lower, upper, length.out = 100),
                                       beta  = seq(lower, upper, length.out = 100),
                                       gamma  = seq(lower, upper, length.out = 100)) 
  • gamma is applicable only if the fruit has gamma parameter in the original table

For example, if my input to the function is "Apple", then I should end up having:

param_grid_Apple <- expand.grid(alpha = seq(10, 20, length.out = 100),
                                beta  = seq(20, 30, length.out = 100)) 

For example, if my input to the function is "Pear", then I should end up having:

param_grid_Pear <- expand.grid(alpha = seq(10, 30, length.out = 100),
                               beta  = seq(20, 40, length.out = 100),
                               gamma = seq(20, 30, length.out = 100)) 

I have tried directly subsetting the row & col index. For example, for Apple's upper alpha, I would do df[2,3]. But this is a rather manual & unsophisticated way to do this. I am wondering if I could wrap everything in a function to streamline this process.

Still a beginner in R and trying to learn ways to streamline procedures by writing functions. Much appreciation for any help!


P.S. (FYI - maybe not be directly related to the center issue of this post) I am doing this so that I can feed param_grid into nls2 function to fit a curve for each fruit:

nls2(formula = ...,
     data = ...,
     start = param_grid, 
     algorithm = "brute-force",
     control = nls.control(maxiter = 1e4))

2

2 Answers

1
votes

Here is another approach to consider with purrr package.

You can create a function and pass it your data frame, the fruit name, and the desired length for your sequence.

You can filter rows that correspond to your fruit, and then use map2 to get sequences for each parameter. cross_df is comparable to expand.grid and will return a data frame.

library(purrr)

param_grid <- function(df, fruit, length) {
  df_fruit <- df %>%
    filter(Category == fruit) 
  
  map2(df_fruit$Upper, df_fruit$Lower, seq, length.out = length) %>%
    set_names(df_fruit$Parameter) %>%
    cross_df()
}

param_grid(df, "Apple", 100)

Output

# A tibble: 1,000,000 x 3
   alpha  beta gamma
   <dbl> <dbl> <dbl>
 1  10      20    20
 2  10.2    20    20
 3  10.4    20    20
 4  10.6    20    20
 5  10.8    20    20
 6  11.0    20    20
 7  11.2    20    20
 8  11.4    20    20
 9  11.6    20    20
10  11.8    20    20
# … with 999,990 more rows
1
votes

Here you go! The bulk of the work is being done by assign() which can create named variables from string input for the names, eval(parse()) which allows us to feed R commands in as character strings (even stored in variables!), and do.call() which can operate a function over a list of arguments, which allows us to programmatically build that list each time.

param_grid <- function(data, fruit_name) {
  require(dplyr)
  # Setting up the data 
  df <- data %>%
    filter(Category == fruit_name) %>%
    select(-Category)
  # assigning seqences for each parameter
  for(i in 1:nrow(df)) {
    assign(df$Parameter[i], seq(df$Lower[i], df$Upper[i], length.out = 100))
  }
  #putting them in a list for do.call
  list1 <-lapply(unique(df$Parameter), function(j) eval(parse(text = j)))
  # setting up the data frame for expand.grid
  df2 <- as.data.frame(do.call(cbind, list1))
  names(df2) <- unique(df$Parameter)
  df_expand <- expand.grid(df2)
  return(df_expand)
}

It works!

param_grid_apple <- param_grid(fruit, "Apple")
head(param_grid_apple, 10)
      alpha beta
1  20.00000   30
2  19.89899   30
3  19.79798   30
4  19.69697   30
5  19.59596   30
6  19.49495   30
7  19.39394   30
8  19.29293   30
9  19.19192   30
10 19.09091   30
dim(param_grid_apple)
[1]  10000      2

param_grid_pear <- param_grid(fruit, "Pear")
head(param_grid_pear, 10)
      alpha beta gamma
1  30.00000   40    30
2  29.79798   40    30
3  29.59596   40    30
4  29.39394   40    30
5  29.19192   40    30
6  28.98990   40    30
7  28.78788   40    30
8  28.58586   40    30
9  28.38384   40    30
10 28.18182   40    30

dim(param_grid_pear)
[1]  10000      3