0
votes

I need to evaluate the ML model on another dataset but i don't know what it fully means. I have an idea but i am not sure. Let's say we have:

  • X_train, X_test, y_train, y_test split from X,Y for the first model
  • X_train_2, X_test_2, y_train_2, y_test_2 split from X2, 2 for the 2nd model

After training both model with model.fit, how do i test them on the other database? Is it:

from sklearn.svm import SVC


#training on the first model
svm.fit(X, Y)

#test on  the 2nd model
y_pred = svm.predict(X_test_2)

#evaluate accuracy
print(accuracy_score(y_test, y_pred))

It seems simple but i am really confused, i would appreciate some explanations.

2

2 Answers

2
votes

Testing on another dataset, say X2, y2, does not mean you need to split this second dataset into training & test subsets, as you have done for your original X & y. Once you have fitted your model, say svm, in X as you show, you simply predict on X2 and compare with the labels in y2:

# predict on the 2nd dataset X2
y_pred = svm.predict(X2)

# evaluate accuracy
print(accuracy_score(y2, y_pred))
1
votes

You are on the right path, but a couple of things to keep in mind:

  • Once your model has been trained and you have used model.fit, then you can just use that model to make predictions on the second dataset using model.predict
  • The features and the value you want to predict in the second dataset should be the same as the first dataset. Otherwise, it just doesn't make sense.
  • You do not have two models. You have trained one model using one dataset, and then using the same model to make predictions for the second dataset.
  • You do not need to divide the second dataset into X_train and X_test as the model has already been trained. What you will have, is just X_test or X2, which are all the features with all the rows for the second dataset, and y which is the value you want to predict.

Example:

  • Dataset 1: X_train, X_test, y_train, y_test split from X,Y for training model

  • Dataset 2: X2,Y2

     from sklearn.svm import SVC
    
     #training on the first model
     svm.fit(X_train, y_train)
    
     # predict on the 2nd dataset X2
     y_pred = svm.predict(X2)
    
     #evaluate accuracy of predictions for second dataset
     print(accuracy_score(Y2, y_pred))