1
votes

I was working on price prediction with the data set provided in this link, the imports-85.data.

With horsepower, curb-weight, engine-size and highway-mpg, I tried to normalize (due to the high cost) and run the gradient descent algorithm by implementing the following:

Initialization

data = df[attrs]
m = len(data) # m-training examples
f = len(attrs) # n-features
X = np.hstack((np.ones(shape=(m,1)),np.array(data)))
T = np.zeros(f + 1) # Coefficients of x(0),x(1),...x(n)
norm_price = df.price / 1000
Y = np.array(norm_price)

# Normalization
data['curb-weight'] = (data['curb-weight'] * 0.453592) / 1000    # To kg (e-1000)
data['highway-mpg'] = data['highway-mpg'] * 0.425144    # To km per litre (kml)
data['engine-size'] = data['engine-size'] / 100     # To e-100
data['horsepower'] = data['horsepower'] / 100   # To e-100

col_rename = {
    'curb-weight':'curb-weight-kg(e-1000)',
    'highway-mpg':'highway-kml',
    'engine-size':'engine-size(e-100)',
    'horsepower':'horsepower(e-100)'
}
data.rename(columns=col_rename,inplace=True)

Cost calculation

def calculateCost():
    global m,T,X
    hypot = (X.dot(T) - Y).transpose().dot(X.dot(T) - Y)
    return hypot / (2 * m)

Gradient descent

def gradDescent(threshold,iter = 10000,alpha = 3e-8):
    global T,X,Y,m
    i = 0
    cost = calculateCost()
    cost_hist = [cost]
    while i < iter:
        T = T - (alpha / m) * X.transpose().dot(X.dot(T) - Y)
        cost = calculateCost()
        cost_hist.append(cost)
        i += 1
        if cost <= threshold:
            return cost_hist

I ran the gradient descent with this implementation: Batch Gradient Descent

Without normalization, the cost would be 118634960.460199. With normalization, the cost would be 118.634960460199

As a result, I have a few questions:

  1. Is my normalization technique correct?
  2. After normalization, the cost would be different. How do I set the threshold for the cost after normalization?
1

1 Answers

0
votes

I think you may be misunderstanding 'normalization' in the context of machine learning. From my interpretation of your code your 'normalization' section is doing unit conversions. Prior to gradient decent it is common to apply a max-min scaling or a standard scaling, see the scikit learn user guide. These techniques create features with a consistent scale range, so that changes in a single feature do not completely dominate the loss function. This question and this blog post have a longer discussion.