I've fitted a multi-linear regression model using all predictors from my training set except for 'lastname' using lm(), and now I want to make predictions based on my test set. However, when I try to do that with predict(model.fit, test), I get an error regarding the variable 'lastname'
I've tried passing in a test set excluding the column 'lastname' but that didn't work
Code:
cf_df <- read.csv(file="cap_friendly_data.csv", header=TRUE, sep=",")
new_cols <- c('lastname', 'Position', 'Age.Years', 'Original.Cap.Hit', 'New.Signing.Status', 'PPG.Prior.Signing', 'PPG.Contract.Year', 'New.Cap.Hit')
new_stats <- cf_df[, new_cols]
#create training and testing datasets
set.seed(2430)
num_training_samples <- 2000
train_indices <- sample(1:nrow(new_stats), num_training_samples, replace = FALSE,)
train <- new_stats[train_indices, ]
test <- new_stats[-train_indices, ]
test_results <- test$New.Cap.Hit
#fit model
cap.fit <- lm(New.Cap.Hit ~ . - lastname, data = train)
summary(cap.fit)
predictions <- predict(cap.fit, test)
I thought I'd just get a list of predictions from the model but instead I got this error message:
predictions <- predict(cap.fit, test)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor lastname has new levels Ã…berg, Acciari, Acolatse, Alfredsson, Anderson, Angelidis, Arnold, Backes, Balisy, Baptiste, Barch...