3
votes

Suppose you have a1..an numbers and some queries [l, k] (1 < l, k < n). The problem is to find in [l, k] interval minimum distance between two equal numbers.

Examples: (interval l,k shown as |...|)

1 2 2 |1 0 1| 2 3 0 1 2 3

Answer 2 (101)

1 |2 2| 1 0 1 2 3 0 1 2 3

Answer 1 (22)

1 2 2 1 0 |1 2 3 0 3 2 3|

Answer 2 (303) or (323)

I have thought about segment tree, but it is hard to join results from each tree node, when query is shared between several nodes. I have tried some ways to join them, but it looks ugly. Can somebody give me a hint?


Clarification

Thanks for your answers. The problem is that there are a lot of queries, so o(n) is not good. I do not accidentally mentioned a segment tree. It performs [l, r] query for finding [l, r]SUM or [l, r]MIN in array with log(n) complexity. Can we do some preprocessing to fit in o(logn) here?

2
Is an offline solution feasible?kraskevich
@AlexHoppus I modified my answer to include more preprocessing. All the queries will run in O(1) time now.Millie Smith

2 Answers

1
votes

Call an interval minimal if its first number equals its last but each of the numbers in between appears exactly once in the interval. 11 and 101 are minimal, but 12021 and 10101 are not.

In linear time (assuming constant-time hashing), enumerate all of the minimal intervals. This can be done by keeping two indices, l and k, and a hash map that maps each symbol in between l and k to its index. Initially, l = 1 and k = 0. Repeatedly do the following. Increment k (if it's too large, we stop). If the symbol at the new value of k is in the map, then advance l to the map value, deleting stuff from the map as we go. Yield the interval [l, k] and increment l once more. In all cases, write k as the map value of the symbol.

Because of minimality, the minimal intervals are ordered the same way by their left and right endpoints. To answer a query, we look up the first interval that it could contain and the last and then issue a range-minimum query of the lengths of the range of intervals. The result is, in theory, an online algorithm that does linear-time preprocessing and answers queries in constant time, though for convenience you may not implement it that way.

1
votes

We can do it in O(nlog(n)) with a sort. First, mark all the elements in [l,k] with their original indices. Then, sort the elements in [l,k], first based on value, and second based on original index, both ascending.

Then you can loop over the sorted list, keeping a currentValue variable, and checking adjacent values that are the same for distance and setting minDistance if necessary. currentValue is updated when you reach a new value in the sorted list.

Suppose we have this [l,k] range from your second example:

1 2 3 0 3 2 3

We can mark them as

1(1) 2(2) 3(3) 0(4) 3(5) 2(6) 3(7)

and sort them as

0(4) 1(1) 2(2) 2(6) 3(3) 3(5) 3(7)

Looping over this, there are no ranges for 0 and 1. The minimum distance for 2s is 4, and the minimum distance for 3s is 2 ([3,5] or [3,7], depending on if you reset minDistance when the new minimum distance is equal to the current minimum distance).

Thus we get

[3,5] in [l,k] or [5,7] in [l,k]

EDIT

Since you mention some queries, you can preprocess the list in O(nlog(n)) time, and then only use O(n) time for each individual query. You would just ignore indices that are not in [l,k] while looping over the sorted list.

EDIT 2

This is addressing the clarification in the question, which now states that there will always be lots of queries to run. We can preprocess in O(n^2) time using dynamic programming and then run each query in O(1) time.

First, perform the preprocessing on the entire list that I described above. Then form links in O(n) time from the original list into the sorted list.

We can imagine that:

[l,k] = min([l+1,k], [l,k-1], /*some other sequence starting at l or ending at k*/)

We have one base case

[l,k] = infinity where l = k

If [l,k] is not min([l+1,k], [l,k-1]), then it either starts at l or ends at k. We can take each of these, look into the sorted list and look at the adjacent element in the correct direction and check the distances (making sure we're in bounds). We only have to check 2 elements, so it is a constant factor.

Using this algorithm, we can run the following

for l = n downto 1
    for k = l to n
        M[l,k] = min(M[l+1,k], M[l,k-1], sequence starting at l, sequence ending at k)

You can also store the solutions in the matrix (which is actually a pyramid). Then, when you are given a query [l,k], you just look it up in the matrix.