You are correct, you simply call the predict
method of your model and pass in the new unseen data for prediction. Now it also depends on what you mean by new data
. Are you referencing data that you do not know the outcome of (i.e. you do not know the weight value), or is this data being used to test the performance of your model?
For new data (to predict on):
Your approach is correct. You can access all predictions by simply printing the y_pred
variable.
You know the respective weight values and you want to evaluate model:
Make sure that you have two separate data sets: x_test (containing the features) and y_test (containing the labels). Generate the predictions as you are doing with the y_pred
variable, then you can calculate its performance using a number of performance metrics. Most common one is the root mean square, and you simply pass the y_test
and y_pred
as parameters. Here is a list of all the regression performance metrics supplied by sklearn.
If you do not know the weight value of the 10 new data points:
Use train_test_split to split your initial data set into 2 parts: training
and testing
. You would have 4 datasets: x_train
, y_train
, x_test
, y_test
.
from sklearn.model_selection import train_test_split
# random state can be any number (to ensure same split), and test_size indicates a 25% cut
x_train, y_train, x_test, y_test = train_test_split(calories_eaten, weight, test_size = 0.25, random_state = 42)
Train model by fitting x_train
and y_train
. Then evaluate model's training performance by predicting on x_test
and comparing these predictions
with the actual results from y_test
. This way you would have an idea of how the model performs. Furthermore, you can then predict the weight values
for the 10
new data points accordingly.
It is also worth reading further on the topic as a beginner. This is a simple tutorial to follow.