3
votes

In Linear Regression, there is a cost function as:

https://i.stack.imgur.com/TPOVM.png

The code in Octave is:

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.

H = X*theta; 
S = (H - y).^2;
J = 1 / (2*m) * sum(S);

% =========================================================================

end

Could someone tell me why sigma(h0(x(i))) is equal to a vectorization X*theta?

Thanks

2

2 Answers

1
votes

Could someone tell me why sigma(h0(x(i))) is equal to a vectorization X*theta?

That is not the case. At no point in this code does sigma(h(x_i)) get computed separately. The variable H is not equal to that value but it's a (column) vector that stores the values

 `h(x_i)=dot_product(x_i,theta)` 

for all examples.

The formula that you give in Latex just says that it wants us to sum the ((h(x_i)-y_i))^2 over all examples. What you want to avoid doing is to compute h(x_i) for all of those examples in a sequential manner, because that would be time consuming. From the definition of h(x), you know that

#I've written a more general case, and the case `n==1` will correspond to your Latex formula)
h(x_i)=[1 x_i1 ... x_in]*[theta_0 theta_1 ... theta_n]' 

The matrix X is of size m*n, where m is the number of examples. So each row of the vector

H=X*theta #H is a vector of size m*1

will correspond to a single h(x_i).

Knowing this, you can see that

S=(H-y).^2 #S is a vector of size m*1

is a vector such that each element is one of the (h(x_i)-y_i)^2. So, you just need to sum all of them with sum(S) to get the value of the sigma from your Latex formula.

1
votes

I have used octave notation and syntax for writing matrices: 'comma' for separating column items, 'semicolon' for separating row items and 'single quote' for Transpose.

The second line of the Latex expression in the question, is valid with just one training example, x is a '(f+1) x 1' matrix or a column vector. Specifically x = [x0; x1; x2; x3; .... xf]

x0 is always '1'. Here 'f' is the number of features.

theta = [theta0; theta1; theta2; theta3; .... thetaf].

'theta' is a column vector or '(f+1) x 1' matrix. theta0 is the intercept term.

In this special case with one training example, the '1 x (f+1)' matrix formed by taking theta' and x could be multiplied to give the correct '1x1' hypothesis matrix or a real number.

h = theta' * x as in the second line of the Latex expression is valid.

But the expression m = length(y) indicates that there are multiple training examples. With 'm' training examples, X is a 'm x (f+1)' matrix.

To simplify, let there be two training examples each with 'f' features.

X = [ x1; x2].

(Please note 1 and 2 inside the brackets are not exponential terms but indexes for the training examples).

Here, x1 = [ x01, x11, x21, x31, .... xf1 ] and x2 = [ x02, x12, x22, x32, .... xf2].

So X is a '2 x (f+1)' matrix.

Now to answer the question, theta' is a '1 x (f+1)' matrix and X is a '2 x (f+1)' matrix. With this, the valid expression is X * theta. The expression in Latex theta' * X becomes invalid.

The expected hypothesis matrix in my example, 'h', should have two predicted values (two real numbers), one for each of the two training examples. 'h' is a '2 x 1' matrix or column vector.

The hypothesis can be obtained by using the expression, X * theta which is valid and algebraically correct. Multiplying a '2 x (f+1)' matrix with a '(f+1) x 1' matrix resulting in a '2 x 1' hypothesis matrix.