i am currently working on a hate speech filter using Apache Flink's FlinkML programmed in Scala.
I have a huge .csv training dataset containing rows like:
id,count,hate_speech,offensive_language,neither,class,tweet
326,3,0,1,2,2,"""@complex_uk: Ashley Young has tried to deny that bird s*** landed in his mouth ---> http:**** https:****"" hahaha"
My Problem is, that Flink doesnt include a Vectorizer to transform the Tweets to a LibSVM File readable for the SVM.fit() function.
Do you guys have any idea how i could transform the data above using the "class"-column as a label and the "tweet"-column as the feature vector to train my SVM?
I really appreciate any help. Searching for hours.