Order statistic on intervals

Question

Given an array of numbers a[0], a[1], ..., a[n-1], we get queries of the kind:

output k-th largest number in the range a[i], a[i+1], ..., a[j]

Can these queries be answered in polylogarithmic time (in n) per query? If not, is it possible to average results and still get a good amortized complexity?

EDIT: this can be solved using persistent segment trees http://blog.anudeep2011.com/persistent-segment-trees-explained-with-spoj-problems/

I don't understand the proposed solution in the blog post you linked, and I think it's wrong. I understand how the solution works with one-sided queries, but not two-sided queries. — Buck Shlegeris
@BuckShlegeris two sided is just a difference of one sided queries — iggy
No, it's not. In the array [100, 101, 102, 103], the second-largest element between indexes 2 and 4 is definitely not the second-largest element between 0 and 4 minus the second largest element between 0 and 2. — Buck Shlegeris
First of all, you can binary search the value (and check that its in the list in log(n) time). That part should be clear. Then you just need to test how many values are there less than the number you binary search for (this is where one sides vs two sided and differencing comes in). And finally a list can be represented as a tree, so the solution applies perfectly. — iggy

Evgeny Kluev Evgeny Kluev · Accepted Answer · 2014-10-10T12:55:47

Yes, these queries can be answered in polylog time if O(n log n) space is available.

Preprocess given array by constructing segment tree with depth log(n). So that leaf nodes are identical to source array, next-depth nodes contain sorted 2-element sub-arrays, next level consists of 4-element arrays produced by merging those 2-element arrays, etc. In other words, perform merge sort but keep results of each merge step in separate array. Here is an example:

root:   | 1   2   3   5   5   7   8   9 |
        | 1   2   5   8 | 3   5   7   9 |
        | 1   5 | 2   8 | 7   9 | 3   5 |
source: | 5 | 1 | 2 | 8 | 7 | 9 | 5 | 3 |

To answer a query, split given range (into at most 2*log(n) subranges). For example, range [0, 4] should be split into [0, 3] and [4], which gives two sorted arrays [1 2 5 8] and [7]. Now the problem is simplified to finding k-th element in several sorted arrays. The easiest way to solve it is nested binary search: first use binary search to choose some candidate element from every array starting from largest one; then use binary search in other (smaller) arrays to determine rank of this candidate element. This allows to get k-th element in O(log(n)^4) time. Probably some optimization (like fractional cascading) or some other algorithm could do this faster...

Order statistic on intervals

2 Answers