1
votes

I am new to this so any help is appriciated, this code was given to me by my prof when I asked for an example, I had hoped for a working model...

from numpy import loadtxt
import numpy as np
from sklearn import svm
from sklearn.metrics import accuracy_score, f1_score
from sklearn.feature_selection import SelectPercentile, f_classif

Read data

data = loadtxt('running.txt')
label = loadtxt('walking.txt')
X = data
y = label

Define walking status as 0, running status as 1

print('Class labels:', np.unique(y))

Random pick 50% data as test data and leave the rest as train data

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

Use sklearn to select 50% features

selector = SelectPercentile(f_classif, 50)
selector.fit(X_train, y_train)
X_train_transformed = selector.transform(X_train)
X_test_transformed = selector.transform(X_test)

Apply support vector machine algorithm

clf = svm.SVC(kernel="rbf", C=1)
clf.fit(X_train_transformed, y_train)

 

SVC(C=1, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',max_iter=-1,probability=False, random_state=None, shrinking=True,tol=0.001, verbose=False)

 

pred=clf.predict(X_test_transformed)
print("Accuracy is %.4f and the f1-score is %.4f " %
(accuracy_score(pred, y_test), f1_score(y_test, pred)))

Traceback (most recent call last): File "", line 1, in File "C:\Users\praym\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile execfile(filename, namespace) File "C:\Users\praym\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/praym/OneDrive/School/Information Structres/Assignment4.py", line 18, in selector.fit(X_train, y_train) File "C:\Users\praym\Anaconda3\lib\site-packages\sklearn\feature_selection\univariate_selection.py", line 322, in fit X, y = check_X_y(X, y, ['csr', 'csc']) File "C:\Users\praym\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 515, in check_X_y y = column_or_1d(y, warn=True) File "C:\Users\praym\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 551, in column_or_1d raise ValueError("bad input shape {0}".format(shape)) ValueError: bad input shape (10, 90)

2
You didn't identify the line number where the error showed up! The first step in learning to program something is to read the responses on your terminal well and identify the exact problem. This trick is probably enough to debug the issue by yourself. You won't need us for this problem.Rahul Murmuria
from my editor: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\praym\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile execfile(filename, namespace) File "C:\Users\praym\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile exec(compile(f.read(), filename, 'exec'), namespace) And there is more so I don't understand the errorPaul Raymond

2 Answers

3
votes

I will submit this as an answer, because it directly addresses your real problem.

In general computer programming terminology, the error you have got is called a stack trace. There is a Wikipedia page on stack trace, but I will try and explain it in more simple terms here.

The error has a heading "Traceback", because that is what it is doing - tracing back the error. You can see in your python script that every line is some sort of an API call, whether it is loadtxt or print or fit. If an error occurred when you made the call to loadtxt, the Traceback shows you what went wrong exactly, inside the loadtxt call. That function may be calling other functions within the API, and hence you see a "trace". When you write more complicated Python code where you have many functions and classes, you might end up seeing functions that made calls to other functions, all written by you. Therefore,

  1. Always read the Traceback bottom up (it tells you in your output that the "most recent call is last"). You need to get the line number along with the name of the python file where the error occurred.

The line number will take you to the point in code that actually caused the error. Usually, you only need the bottom 1 or 2 calls to solve general problems. If you wrote your own custom API, then the entire trace might become more useful. However, the file name and line number alone is not enough to effectively debug any program.

  1. Next you need to understand what exactly the error is. In your case you see a ValueError. This generally means that the value of your variable does not match the variable type. However, the sentence following the exception type gives you more detail on what exactly caused this ValueError.

For more details about each of the exception types and their meanings, read the documentation about built-in exceptions. Further, you can understand more about how to handle such exceptions from the tutorial here.

  1. Usually, knowing the line number of the bottom most call and the type of exception is enough for you to understand what you did wrong. However, if you are sure that your use of the variable in that line is correct, then you must delve deeper into the stack trace, and look for the call second from the bottom. For that you will again see a file name and a line number.

By repeating these steps, you will be able to effectively debug your own programs. Note that debugging is not only a method to remove errors from your programs. It is the ability to step through your code and identify what each line is doing and comparing it to what they are supposed to be doing. It is the very foundation of what is called computer programming. If you do this right, you may still have questions to ask, but your questions will improve. That is when Stack Overflow comes in (note that the name of this website is by itself a play on the concept of stack trace).


EDIT: In yor stack trace, your error is here:

File "C:/Users/praym/OneDrive/School/Information Structres/Assignment4.py", line 18, in selector.fit(X_train, y_train).

It appears that one or both of your input variables X_train and y_train aren't of the shape that is acceptable by that fit function.


EDIT: If you load the files the way you have, then you cannot get the right X_train and y_train variables. You seem to have two types of data, one for walking and one for running. They are both data. Each entry in the walking data should have a label 'walking' and each entry in the running data should have label 'running'.

Now, this is fundamental to data mining. You need to know what data and label means.

3
votes

With 90 features, you are most likely using one hot encoder to get that many features (dummy variables). Before fitting into your models, try:

y_train = np.argmax(y_train, axis=1)

This will allow you to pass the one hot encoding into your fit functions.