Here is the pseudocode for median of medians algorithm (slightly modified to suit your example). The pseudocode in wikipedia fails to portray the inner workings of the selectIdx
function call.
I've added comments to the code for explanation.
// L is the array on which median of medians needs to be found.
// k is the expected median position. E.g. first select call might look like:
// select (array, N/2), where 'array' is an array of numbers of length N
select(L,k)
{
if (L has 5 or fewer elements) {
sort L
return the element in the kth position
}
partition L into subsets S[i] of five elements each
(there will be n/5 subsets total).
for (i = 1 to n/5) do
x[i] = select(S[i],3)
M = select({x[i]}, n/10)
// The code to follow ensures that even if M turns out to be the
// smallest/largest value in the array, we'll get the kth smallest
// element in the array
// Partition array into three groups based on their value as
// compared to median M
partition L into L1<M, L2=M, L3>M
// Compare the expected median position k with length of first array L1
// Run recursive select over the array L1 if k is less than length
// of array L1
if (k <= length(L1))
return select(L1,k)
// Check if k falls in L3 array. Recurse accordingly
else if (k > length(L1)+length(L2))
return select(L3,k-length(L1)-length(L2))
// Simply return M since k falls in L2
else return M
}
Taking your example:
The median of medians function will be called over the entire array of 45 elements like (with k = 45/2 = 22
):
median = select({48 49 50 51 52 43 44 45 46 47 38 39 40 41 42 33 34 35 36 37 28 29 30 31 32 23 24 25 26 27 18 19 20 21 22 13 14 15 16 17 8 9 10 53 54}, 45/2)
The first time M = select({x[i]}, n/10)
is called, array {x[i]}
will contain the following numbers: 50 45 40 35 30 20 15 10
.
In this call, n = 45
, and hence the select function call will be M = select({50 45 40 35 30 20 15 10}, 4)
The second time M = select({x[i]}, n/10)
is called, array {x[i]}
will contain the following numbers: 40 20
.
In this call, n = 9
and hence the call will be M = select({40 20}, 0)
.
This select call will return and assign the value M = 20
.
Now, coming to the point where you had a doubt, we now partition the array L
around M = 20
with k = 4
.
Remember array L
here is: 50 45 40 35 30 20 15 10
.
The array will be partitioned into L1, L2
and L3
according to the rules L1 < M
, L2 = M
and L3 > M
. Hence:
L1: 10 15
L2: 20
L3: 30 35 40 45 50
Since k = 4
, it's greater than length(L1) + length(L2) = 3
. Hence, the search will be continued with the following recursive call now:
return select(L3,k-length(L1)-length(L2))
which translates to:
return select({30 35 40 45 50}, 1)
which will return 30 as a result. (since L has 5 or fewer elements, hence it'll return the element in kth i.e. 1st position in the sorted array, which is 30).
Now, M = 30
will be received in the first select
function call over the entire array of 45 elements, and the same partitioning logic which separates the array L
around M = 30
will apply to finally get the median of medians.
Phew! I hope I was verbose and clear enough to explain median of medians algorithm.