Random Forest In Python [Error in r2_score]

Question

I am new to Machine Learning and to Python. I am trying to build a Random Forest model in order to predict cement strength. There are two .csv files: train_data.csv and test_data.csv.

This is what I have done. I am trying to predict the r2_score here.

df=pd.read_csv("train_data(1).csv")
X=df.drop('strength',axis=1)
y=df['strength']
model=RandomForestRegressor()
model.fit(X,y)
X_test=pd.read_csv("test_data.csv")
y_pred=model.predict(X_test)
acc_R=metrics.r2_score(y,y_pred)
acc_R

The problem here is that the shape of y and y_pred is different. So I get this error:

ValueError: Found input variables with inconsistent numbers of samples: [721, 309]

How do I correct this? Can someone explain to me what I am doing wrong?

Shamsul Masum Shamsul Masum · Accepted Answer · 2020-06-03T21:03:35

df_train = pd.read_csv("train_data(1).csv")
X_train = df.drop('strength',axis=1)
y_train = df['strength']
model=RandomForestRegressor()
model.fit(X_train,y_train)
df_test = pd.read_csv("test_data.csv")
X_test = df.drop('strength',axis=1) # if your test data consists of 'strength' 
y_test = df['strength'] # if your test data consists of 'strength' 
y_pred = model.predict(X_test)
acc_R = metrics.r2_score(y_test,y_pred)
acc_R

Random Forest In Python [Error in r2_score]

2 Answers