2
votes

I want to train a SVM to perform a classification of samples. I have a csv file with me that has 3 columns with headers: feature 1,feature 2, class label and 20 rows(= number of samples).

Now I quote from the Scikit-Learn documentation " As other classifiers, SVC, NuSVC and LinearSVC take as input two arrays: an array X of size [n_samples, n_features] holding the training samples, and an array y of class labels (strings or integers), size [n_samples]:"

I understand that I need to obtain two arrays(one 2d & one 1d array) in order to feed data into the SVM. However I am unable to understand how to obtain the required array from the csv file. I have tried the following code

import numpy as np
data = np.loadtxt('test.csv', delimiter=',')
print data

However it is showing an error "ValueError: could not convert string to float: ��ࡱ�"

There are no column headers in the csv. Am I making any mistake in calling the function np.loadtxt or should something else be used?

Update: Here's how my .csv file looks like.

12  122 34
12234   54  23
23  34  23
2
Would be useful to see the first few lines of your csvEdChum
Hi I've updated the question with few lines of the csv.AviB
I see no delimiter in your csv, remove the delimiter param so: data = np.loadtxt('test.csv') should workEdChum
It is also possible that the delimiter is a tab (look at how the values are aligned). If that is the case, try delimiter='\t'.Warren Weckesser
@EdChum- yes now I notice the absence of commas.AviB

2 Answers

0
votes

You passed the param delimiter=',' but your csv was not comma separated.

So the following works:

In [378]:

data = np.loadtxt(path_to_data)
data
Out[378]:
array([[  1.20000000e+01,   1.22000000e+02,   3.40000000e+01],
       [  1.22340000e+04,   5.40000000e+01,   2.30000000e+01],
       [  2.30000000e+01,   3.40000000e+01,   2.30000000e+01]])

The docs show that by default the delimiter is None and so treats whitespace as the delimiter:

delimiter : str, optional The string used to separate values. By default, this is any whitespace.

0
votes

The issue was with the csv file rather than the loadtxt() function. The format in which I saved was not giving a proper .csv file(dont know why!-maybe I didnt save it at all). But there is a way to verify whether the csv file is saved in the right format or not. Open the .csv file using notepad. If the data has commas between them, then it is saved properly. And loadtxt() will work. If it shows some gibberish, then create it again and then check.