0
votes

I would like to create a function to train and test 10 separate data sets, in two lists. Here are the lists:

blend_30_d<-list(desktop_30_1, desktop_30_2, desktop_30_3, desktop_30_4, desktop_30_5, desktop_30_6, desktop_30_7, desktop_30_8, desktop_30_9, desktop_30_10)

blend_30_td<-list(desktop_30_t1, desktop_30_t2, desktop_30_t3, desktop_30_t4, desktop_30_t5, desktop_30_t6, desktop_30_t7, desktop_30_t8, desktop_30_t9, desktop_30_t10)

The names of each individual dataset are:

[1] "date" "Wkday" "Imps" "Clicks" "Total_Cost" "Units"
[7] "January" "February" "March" "April" "May" "June"
[13] "July" "August" "September" "October" "November" "December"
[19] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday"
[25] "Sunday" "Vday" "Tgiving" "Xmas" "XmasE" "NYE"
[31] "NYD" "July4" "Labor" "Memorial" "Mob_App_Launch" "Auto_Approve_Launch"

I've built the following function- I want blend_30_d[1] to get tested against blend_30_td[1].

d_cost <- function(train, test){
    ####Run regression on training
    q<-lm(Total_Cost ~ . -date - Wkday - Imps - Clicks + poly(date, 2), data=train)
    ####Predict values into test set
    test_cost_d <- predict.lm(q, x=test)
    ####Calculate R^2 between predicted vs. actual values
    z<-(cor(test_cost_d, test$Total_Cost))^2
}

d_cost(blend_30_d, blend_30_td)

I'm receiving the following error: Error in terms.formula(formula, data = data) : duplicated name 'date' in data frame using '.'

I'm not sure that this is the correct approach with two lists...any suggestions? Thanks!

2

2 Answers

0
votes

Your d_cost function is built to take two data frames, one for testing and one for training. You're trying to call it by passing it two lists of data frames. You've built your function for one pair of data frames at a time, so you need to give it one pair, not 2 lists of pairs. Try something like this:

z = rep(NA, length(blend_30_d)
for (i in seq_along(blend_30_d) {
    z[i] = d_cost(blend_30_d[[i]], blend_30_td[[i]])
}
0
votes

I think you might need to add a loop:

for(i in 1:10){
    d_cost(train[[i]], test[[i]])
}