I have used octave notation and syntax for writing matrices: 'comma' for separating column items, 'semicolon' for separating row items and 'single quote' for Transpose.
The second line of the Latex expression in the question, is valid with just one training example, x is a '(f+1) x 1' matrix or a column vector. Specifically x =
[x0; x1; x2; x3; .... xf]
x0 is always '1'. Here 'f' is the number of features.
theta =
[theta0; theta1; theta2; theta3; .... thetaf].
'theta' is a column vector or '(f+1) x 1' matrix. theta0 is the intercept term.
In this special case with one training example, the '1 x (f+1)' matrix formed by taking theta' and x could be multiplied to give the correct '1x1' hypothesis matrix or a real number.
h = theta' * x
as in the second line of the Latex expression is valid.
But the expression m = length(y)
indicates that there are multiple training examples. With 'm' training examples, X is a 'm x (f+1)' matrix.
To simplify, let there be two training examples each with 'f' features.
X = [ x1; x2].
(Please note 1 and 2 inside the brackets are not exponential terms but indexes for the training examples).
Here, x1 = [ x01, x11, x21, x31, .... xf1 ]
and
x2 = [ x02, x12, x22, x32, .... xf2].
So X is a '2 x (f+1)' matrix.
Now to answer the question, theta' is a '1 x (f+1)' matrix and X is a '2 x (f+1)' matrix. With this, the valid expression is X * theta
.
The expression in Latex theta' * X
becomes invalid.
The expected hypothesis matrix in my example, 'h', should have two predicted values (two real numbers), one for each of the two training examples. 'h' is a '2 x 1' matrix or column vector.
The hypothesis can be obtained by using the expression, X * theta
which is valid and algebraically correct. Multiplying a '2 x (f+1)' matrix with a '(f+1) x 1' matrix resulting in a '2 x 1' hypothesis matrix.