59
votes

I try to run following code. Btw, I am new to both python and sklearn.

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression


# data import and preparation
trainData = pd.read_csv('train.csv')
train = trainData.values
testData = pd.read_csv('test.csv')
test = testData.values
X = np.c_[train[:, 0], train[:, 2], train[:, 6:7],  train[:, 9]]
X = np.nan_to_num(X)
y = train[:, 1]
Xtest = np.c_[test[:, 0:1], test[:, 5:6],  test[:, 8]]
Xtest = np.nan_to_num(Xtest)


# model
lr = LogisticRegression()
lr.fit(X, y)

where y is a np.ndarray of 0's and 1's

I receive the following:

File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line >1174, in fit check_classification_targets(y)

File "C:\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 172, >in check_classification_targets raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'unknown'

from sklearn documentation: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit

y : array-like, shape (n_samples,) Target values (class labels in classification, real numbers in regression)

What is my error?

upd:

y is array([0.0, 1.0, 1.0, ..., 0.0, 1.0, 0.0], dtype=object) size is (891,)

2
provide a glimpse of data and imports please. Why do you use numpy, you can also just select the columns of the dataframe by name. Btw, why does the test file have a different structure thant the train file. This seens odd.Quickbeam2k1
This error arises if scikit-learn is not able to tell what type of problem you want to solve (looking at the y data, it will return binary, multiclass, continuous etc) .Specifically, what type of data is in your y? Post it here, or as @Quickbeam2k1 said, it would be more helpful if samples of complete data are posted.Vivek Kumar
I had the same problem, despite using numpy.arrays. I think the y data is a problem because they are float 1.0. Use lr.fit(X,y.astype(int)) . I tried lr.fit(X,y.astype(float)) but I got the same error. I tried to do an Gaussian Naive Bayes modelTheodor Paulus

2 Answers

133
votes

Your y is of type object, so sklearn cannot recognize its type. Add the line y=y.astype('int') right after the line y = train[:, 1].

0
votes

Adding to Miriam ,I also got the similar error but in my case individual elements of y_pred was of type 'np.int32' and individual elements of y was of type 'int'. I solved it by doing:

for i,x in enumerate(y_pred):
    y_pred[i]=x.astype('int')