I am working on a tutorial by Dhaval Patel to create a linear regression prediction model to get a vehicle sale price based on age and mileage. The model works great but am not sure how I can pass an input to get predicted sale price as I am new to all of this but really want to learn!
Below is the basic python script to produce an outputted prediction of a sale price -
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = pd.read_csv("carprices.csv")
print(df)
X = df[['Mileage','Age(yrs)']]
y = df['Sell Price($)']
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
print(X_train)
print(X_test)
clf = LinearRegression()
clf.fit(X_train, y_train)
clf.predict(X_test)
print(y_test)
print(clf.score(X_test, y_test))
# carprices.csv data structure
Mileage Age(yrs) Sell Price($)
0 69000 6 18000
1 35000 3 34000
2 57000 5 26100
etc..
======= Output ======
# array with predicted sale prices
array([25246.47327651, 16427.86889147, 27671.99607064, 25939.47978912])
#Output of test data
5 26750
14 19400
19 28200
2 26100
Name: Sell Price($), dtype: int64
0.7512386644573188
So basically its splitting the csv data into 2 sections for training and test data which test data is 20% of the data set. What I would like to do is pass and input of age and mileage of a certain vehicle and have the model predict what the sale price would be based on that single input. Where would I add this input?
Link to github example - https://github.com/codebasics/py/blob/master/ML/6_train_test_split/train_test_split.ipynb