I try to run following code. Btw, I am new to both python and sklearn.
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
# data import and preparation
trainData = pd.read_csv('train.csv')
train = trainData.values
testData = pd.read_csv('test.csv')
test = testData.values
X = np.c_[train[:, 0], train[:, 2], train[:, 6:7], train[:, 9]]
X = np.nan_to_num(X)
y = train[:, 1]
Xtest = np.c_[test[:, 0:1], test[:, 5:6], test[:, 8]]
Xtest = np.nan_to_num(Xtest)
# model
lr = LogisticRegression()
lr.fit(X, y)
where y is a np.ndarray of 0's and 1's
I receive the following:
File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line >1174, in fit check_classification_targets(y)
File "C:\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 172, >in check_classification_targets raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'unknown'
from sklearn documentation: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit
y : array-like, shape (n_samples,) Target values (class labels in classification, real numbers in regression)
What is my error?
upd:
y is array([0.0, 1.0, 1.0, ..., 0.0, 1.0, 0.0], dtype=object) size is (891,)
scikit-learn
is not able to tell what type of problem you want to solve (looking at they
data, it will return binary, multiclass, continuous etc) .Specifically, what type of data is in youry
? Post it here, or as @Quickbeam2k1 said, it would be more helpful if samples of complete data are posted. – Vivek Kumar