0
votes

Here is the problem from the Elements of Programming Interviews:

Let P be a set of n points in the plane. Each point has integer coordinates. Design an efficient algorithm for computing a line that contains the maximum number of points in P.

Here is the quote from the solution:

One idea would be to compute a hash code from the slope and the y-intercept of this line as an ordered pair of doubles. Because of finite precision arithmetic, we may have three points that are collinear map to distinct buckets. If the generated uniform [0, 1] random number lies into [0.3, 0.6) we return the number 6.

A more robust hash function treats the slope and the y-intercept as rationals. A rational is an ordered pair of integers: the numerator and the denominator. We need to bring the rational into a canonical form before applying the hash function. One canonical form is to make the denominator always nonnegative, and relatively prime to the numerator. Lines parallel to the y-axis are a special case. For such lines, we use the x-intercept in place of the y-intercept, and use 1/0 as the slope.

The part that I don't understand is this:

We need to bring the rational into a canonical form before applying the hash function.

Why do we need to bring the rational into a canonical form before applying the hash function?

1
because ... you want equal values to have same hashes?John Dvorak

1 Answers

1
votes

They are being (IMO) overly precise. They aren't treating rational numbers as, well, distinct numbers, but as ordered pairs of integers. In this formulation, (1,2) and (3,6) would be distinct rationals. The canonical form they are talking about would be what you implicitly assume, which is that (1,2) and (3,6) are the same rational number, namely (when written as a decimal) 0.5.

Making the denominator non-negative lets you treat (-1,2) and (1,-2) as the same number, and making the denominator relatively prime to the numerator is simply cancelling common factors to reduce (3,6) to (1,2)