I am new to Machine Learning and to Python. I am trying to build a Random Forest model in order to predict cement strength.
There are two .csv
files: train_data.csv
and test_data.csv
.
This is what I have done. I am trying to predict the r2_score
here.
df=pd.read_csv("train_data(1).csv")
X=df.drop('strength',axis=1)
y=df['strength']
model=RandomForestRegressor()
model.fit(X,y)
X_test=pd.read_csv("test_data.csv")
y_pred=model.predict(X_test)
acc_R=metrics.r2_score(y,y_pred)
acc_R
The problem here is that the shape of y
and y_pred
is different. So I get this error:
ValueError: Found input variables with inconsistent numbers of samples: [721, 309]
How do I correct this? Can someone explain to me what I am doing wrong?