0
votes

Here is the code that I run beforehand.

library(ggplot2)
library(caret)

filename <- "iris.csv"

dataset <- read.csv(filename, header = FALSE)
  
colnames(dataset) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")


validation_index <- createDataPartition(dataset$Species, p=0.80sa, list=FALSE)
validation <- dataset[-validation_index,]
dataset <- dataset[validation_index,]

My question is why when I try to run levels(dataset$Species) all I get is NULL Species is a character variable and I should get 3 results: Iris-setosa, Iris-versicolor, and Iris-virginica. The code works when I import the dataset directly from R, but not when I try to import a csv file.

1
factors and characters look pretty similar in a dataframe but there are different. If is.factor(dataset$Species) gives you a FALSE, you may turn Species in a factor dataset$Species <- as.factor(dataset$Species) and then try levels(dataset$Species) again. Character variables do not have levels. If Species is a character the function levels won't find any levels and results in NULL .tamtam
You'll get your expected result if you just run data(iris); levels(iris$Species). The issue you're seeing is likely related to whatever is in 'iris.csv'.andrew_reece

1 Answers

1
votes

tamtam's comment worked. I just added dataset$Species <- as.factor(dataset$Species) to my code after colnames(dataset) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")