I have a CSV file containing following data with 9000+ records
id,Category1,Category2
How do I convert this csv file to RDD<Vector> so that I can use it to find similar column using columnSimilarities of Apache Spark in java.
as I read, Vector can hold the ID and and double[] for the values. you need to fill the Vector.
List<String> lines = Files.readAllLines(Paths.get("myfile.csv"), Charset.defaultCharset());
then you can iterate over lines, create a Vector for each line, fill it with the values (you need to parse them) and add them to the RDD