1
votes

I am trying to make multiple linear regression with sklearn.

features_2 = ['chronic_disease_binary', 'outcome']

X = df.loc[:, features_2].values
Y = df.loc[:, ['age']].values
# X = pd.get_dummies(X,drop_first=True)
#
X_train_lm, X_test_lm, y_train_lm, y_test_lm = create_dataset_test(X, Y)
X_train_lm = X_train_lm.reshape((2596, -1))
lm = linear_model.LinearRegression()
model = lm.fit(X_train_lm, y_train_lm)
y_pred_lm = lm.predict(X_test_lm)

I have this issue when I am trying tp make prediction on X_test :

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1)

  • My X_train has this form :
[[-0.77046461  1.29791815]
 [-0.77046461 -0.77046461]
 [-0.77046461  1.29791815]
 ...
 [-0.77046461 -0.77046461]
 [-0.77046461  1.29791815]
 [-0.77046461 -0.77046461]]
  • And my y_train is like this :
[[59.]
 [54.]
 [40.]
 ...
 [24.]
 [33.]
 [41.]]

  • The data where I make my prediction has this form :
[[-0.76666002]
 [ 1.30435914]
 [-0.76666002]
 ...
 [-0.76666002]
 [-0.76666002]
 [-0.76666002]]

2
What does X_test_lm.shape give you?Ami Tavory
@AmiTavory it gives me (1300, 1)Jo98
see my answer! As I said you have mismatch in the dimensions.seralouk

2 Answers

1
votes

Dimension mismatch.

You have incompatible dimensions, since X_test_lm has N (number of rows) samples but only 1 (number of columns) feature/variable compared to the shape of X_train.


Details:

You have a X_train as:

[[-0.77046461  1.29791815]
 [-0.77046461 -0.77046461]
 [-0.77046461  1.29791815]
 ...
 [-0.77046461 -0.77046461]
 [-0.77046461  1.29791815]
 [-0.77046461 -0.77046461]]

so the model is trained on N (number of rows) samples with 2 (number of columns) features/variables.

Then, when you ask to predict the:

[[-0.76666002]
 [ 1.30435914]
 [-0.76666002]
 ...
 [-0.76666002]
 [-0.76666002]
 [-0.76666002]]

you have incompatible dimesnions, since X_test_lm has again N (number of rows) samples but this time only 1 (number of columns) feature/variable.

But, the predict function of the model expects an input an array with shape [N,2] and you get:

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1)

As you said, X_test_lm.shape is (1300, 1) so the model is trying to predict the values of these 1300 samples having only one feature (1). That's what triggers the error. The model was training using the X_train that had shape [N,2] not [N,1].


1
votes

As the value of X_test_lm.shape is (1300, 1), it means that it has only 1 column, not 2 as the train data. The beta vector trained on the trained data expects a matrix with 2 columns, which gives the error.

You should check the definition of create_dataset_test to see how you got to this state.