I created a training and test set and tested my model. My workflow looks at follows:
# Test/train
set.seed(2402) ## This generates a random order
splits <- initial_split(Data, prop = 0.7) ## 70% will be training data
# Create a train and test set
Data_train <- training(splits)
Data_test <- testing(splits)
# Specify the recipe
rf_mod <- rand_forest(mtry = tune(), min_n = tune(), trees = 200) %>%
set_mode("regression") %>%
set_engine("ranger", importance = "permutation")
# Create a workflow
rf_mod_workflow <- workflow() %>%
add_model(rf_mod) %>%
add_recipe(rf_mod_recipe)
rf_mod_workflow
# State our error metrics
class_metrics <- metric_set(rmse, mae)
Make the computation faster by registerDoParallel()
registerDoParallel()
rf_grid <- grid_regular(
mtry(range = c(5, 15)),
min_n(range = c(10, 200)),
levels = 5
)
rf_grid
set.seed(654321)
rf_tune_res <- tune_grid(
rf_mod_workflow,
resamples = cv_folds,
grid = rf_grid,
metrics = class_metrics
)
# Select the best number of mtry
best_rmse <- select_best(rf_tune_res, "rmse")
rf_final_wf <- finalize_workflow(rf_mod_workflow, best_rmse)
rf_final_wf
# Finalise the workflow
set.seed(56789)
rf_final_fit <- rf_final_wf %>%
last_fit(splits, metrics = class_metrics)
However, I now want to use my created model to predict on a new dataset. The problem is that this new dataset contains NA values. Is it still possible to predict on a dataset that has NA values, or does the random forest not allow it? I did something similar for a linear regression and that one ignored the NA values and only predicted for instances where no NA values are present.