I am following a machine learning course on Coursera and I am doing the following exercise using Octave (MatLab should be the same).
The exercise is related to the calculation of the cost function for a gradient descent algoritm.
In the course slide I have that this is the cost function that I have to implement using Octave:
This is the formula from the course slide:
So J is a function of some THETA variables represented by the THETA matrix (in the previous second equation).
This is the correct MatLab\Octave implementation for the J(THETA) computation:
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
% You should set J to the cost.
J = (1/(2*m))*sum(((X*theta) - y).^2)
% =========================================================================
end
where:
X is a 2 column matrix of m rows having all the elements of the first column set to the value 1:
X =
1.0000 6.1101
1.0000 5.5277
1.0000 8.5186
...... ......
...... ......
...... ......
y is a vector of m elements (as X):
y =
17.59200
9.13020
13.66200
........
........
........
Finnally theta is a 2 columns vector having 0 asvalues like this:
theta = zeros(2, 1); % initialize fitting parameters
theta
theta =
0
0
Ok, coming back to my working solution:
J = (1/(2*m))*sum(((X*theta) - y).^2)
specifically to this matrix multiplication (multiplication between the matrix X and the vector theta): I know that it is a valid matrix multiplication because the number of column of X (2 columns) is equal to the number of rows of theta (2 rows) so it is a perfectly valid matrix multiplication.
My doubt that is driving me crazy (probably it is a trivial doubt) is related to the previous course slide context:
As you can see in the second equation used to calculated the current h_theta(x) value it is using the transposed theta vector and not the theta vector as done in the code.
Why ?!?!
I suspect that it depends only on how was created the theta vector. It was build in this way:
theta = zeros(2, 1); % initialize fitting parameters
that is generating a 2 line 1 column vector instead of a classic one line 2 column vector. So maybe I have not to transpose it. But I am absolutely not sure about this assertion.
Is my intuition correct or what am I missing?