load data from csv into Scikit learn SVM

Question

I want to train a SVM to perform a classification of samples. I have a csv file with me that has 3 columns with headers: feature 1,feature 2, class label and 20 rows(= number of samples).

Now I quote from the Scikit-Learn documentation " As other classifiers, SVC, NuSVC and LinearSVC take as input two arrays: an array X of size [n_samples, n_features] holding the training samples, and an array y of class labels (strings or integers), size [n_samples]:"

I understand that I need to obtain two arrays(one 2d & one 1d array) in order to feed data into the SVM. However I am unable to understand how to obtain the required array from the csv file. I have tried the following code

import numpy as np
data = np.loadtxt('test.csv', delimiter=',')
print data

However it is showing an error "ValueError: could not convert string to float: ��ࡱ�"

There are no column headers in the csv. Am I making any mistake in calling the function np.loadtxt or should something else be used?

Update: Here's how my .csv file looks like.

12  122 34
12234   54  23
23  34  23

I see no delimiter in your csv, remove the delimiter param so: data = np.loadtxt('test.csv') should work — EdChum
It is also possible that the delimiter is a tab (look at how the values are aligned). If that is the case, try delimiter='\t'. — Warren Weckesser

EdChum EdChum · Accepted Answer · 2015-05-08T09:50:00

You passed the param delimiter=',' but your csv was not comma separated.

So the following works:

In [378]:

data = np.loadtxt(path_to_data)
data
Out[378]:
array([[  1.20000000e+01,   1.22000000e+02,   3.40000000e+01],
       [  1.22340000e+04,   5.40000000e+01,   2.30000000e+01],
       [  2.30000000e+01,   3.40000000e+01,   2.30000000e+01]])

The docs show that by default the delimiter is None and so treats whitespace as the delimiter:

delimiter : str, optional The string used to separate values. By default, this is any whitespace.

load data from csv into Scikit learn SVM

2 Answers