0
votes

I have created a keras regressor model to predict nitrate concentrations from several land cover attributes. However, I am not sure how to interpret the following outcomes:

  • loss: 0.0517
  • mean_squared_error: 0.0517
  • mean_absolute_error: 0.1988
  • val_loss: 0.0357
  • val_mean_squared_error: 0.0357
  • val_mean_absolute_error: 0.1416

Are these values After 1000 epochs sufficient and what do they say about this model(significant or not)? I am new to coding.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.wrappers.scikit_learn import KerasRegressor
seed = 7
np.random.seed(seed)

dataset=np.loadtxt("N_LANDCOV.csv", delimiter=",")
x=dataset[:,:-1]
y=dataset[:,-1]
y=np.reshape(y, (-1,1))
scaler = MinMaxScaler()
print(scaler.fit(x))
print(scaler.fit(y))
xscale=scaler.transform(x)
yscale=scaler.transform(y)

X_train, X_test, y_train, y_test = train_test_split(xscale, yscale)
model = Sequential()
model.add(Dense(28, input_dim=12, kernel_initializer='normal', activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='linear'))
model.summary()
model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])
history = model.fit(X_train, y_train, epochs=1000, batch_size=10,  verbose=1, validation_split=0.33)

Epoch 1000/1000
30/30 [==============================] - 0s 702us/step - loss: 0.0517 - mean_squared_error: 0.0517 - mean_absolute_error: 0.1988 - val_loss: 0.0357 - val_mean_squared_error: 0.0357 - val_mean_absolute_error: 0.1416

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

print(history.history.keys())
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
plt.figure(figsize=(200, 200))

loss vs epoch graph

1
The errors are simply the mean square/absolute difference between the real and model-predicted outputs. You have used validation split, so the 'val' errors are the same values calculated on the validation data subset. Considering you have normalized the data between -1 and 1, the average errors do seem proportionally high.PJRobot
Am I right in observing that only 30 data samples are used to train the network? (machinelearningmastery.com/…)PJRobot
Thanks for your rapid reply. The data set has 12 attributes each with 62 samples excluding the one 1 predictive attribute. I have used a split of 0.33, thus around 20 samples are used to train if I am not mistaking.Laurens Julius de Boer
What would you suggest to improve this model?Laurens Julius de Boer
I would start by getting a lot more data, if that is possible. With 12 inputs (hence 12 input dimensions) there is a very large search space; your data samples are likely incredibly sparse.PJRobot

1 Answers

0
votes

I would suggest you to get more data for your model. As rightly said by @PJRobot , there is too large search space with less data samples.

You can also do some pre-processing to reduce dimensions for analysis ( if it is possible with the features available). But try getting more data before you apply a Neural Network Regressor. Also try changing the metric to 'accuracy' if you cant understand your current metric findings(they are simply loss values).

I used accuracy in my earlier days of Deep Learning to understand my model capabilities.(That was classification) Also in Regression , try to stick to error metrics as it gives more prominent inferences.

Also try some hyper-parameter tuning for better results.

Happy Learning.