I want to use spark mllib naive bayes to process (train and test) data like this
Male,Suspicion of Alcohol,Weekday,12am-4am,75,30-39
so that I can test for labels Male / Female / Unknown. I want to create a LabeledPoint so that this data can be run against the mllib naive bayes algorithm. The example on the spark site
only shows data that is all numeric. Is it possible to run using string data like this ? I understand that my test label will need to be converted to a double value i.e. Male / Female / Unknown => 1.0 / 2.0 / 3.0
If so, how do I convert the CSV data above to a LabelPoint using this type of syntax ?
val parsedData = data.map { line =>
val parts = line.split(',')
Vectors.dense(parts(1).split(' ').map(_.toDouble)))