3
votes

Given an array, I want to find the largest subset of elements such that the smallest and largest elements of the subset are less than or equal to K apart. Specifically, I want the elements, not just the size. If there are multiple occurrences, any can be matched.

For example, in the array [14,15,17,20,23], if K was 3, the largest subset possible would be [14,15,17]. The same would go if 17 was replaced by 16. Also, multiple elements should be matched, such as [14,14,14,15,16,17,17]. The array is not necessarily sorted, but it is probably a good starting point to sort it. The elements are not necessarily integral and the subset not necessarily consecutive in the original array - I just want an occurrence of the largest possible subset.

To illustrate the desired result more clearly, a naïve approach would be to first sort the array, iterate over every element of the sorted array, and then create a new array containing the current element that is extended to contain every element after the current element <= K larger than it. (i.e. in the first above example, if the current element was 20, the array would be extended to [20,23] and then stop because the end of the array was reached. If the current element was 15, the array would be extended to [15,17] and then stop because 20 is more than 3 larger than 15.) This array would then be checked against a current maximum and, if it was larger, the current maximum would be replaced. The current maximum is then the largest subset. (This method is of complexity O(N^2), in the case that the largest subset is the array.)

I am aware of this naïve approach, and this question is asking for an optimised algorithm.

A solution in Python is preferable although I can run with a general algorithm.

5
You should use a customized suffix tree. - kasravnd
Are the values always integers? - samgak
@samgak No, they are not necessarily integers. - ChiCubed
@Kasramvd If you have time, please write an answer to this question implementing this customized suffix tree, and if it turns out to be a more optimised algorithm than any of the others that are presented here, I will accept it as correct. - ChiCubed

5 Answers

1
votes

I assume that we can not modify array by sorting it & we have to find out largest consecutive Subset, So my solution (in python 3.2) is :

arr = [14, 15, 17, 20, 23]
k = 3
f_start_index=0
f_end_index =0 
length = len(arr)
for i in range(length):
    min_value = arr[i]
    max_value = arr[i]
    start_index = i
    end_index = i
    for j in range((i+1),length):
        if (min_value != arr[j] and max_value != arr[j]) :
            if (min_value > arr[j]) :
                min_value = arr[j]
            elif (max_value < arr[j]) : 
                max_value = arr[j]
            if(max_value-min_value) > k :
                break
        end_index = j
    if (end_index-start_index) > (f_end_index-f_start_index):
        f_start_index = start_index
        f_end_index = end_index
    if(f_end_index-f_start_index>=(length-j+1)):  # for optimization
        break
for i in range(f_start_index,f_end_index+1):
    print(arr[i],end=" ")

It is not most efficient solution , but it will get your work done.

Tested against :

1.input:[14, 15, 17, 20, 23]

1.output:14 15 17

2.input:[14,14,14,15,16,17,17]

2.output:14 14 14 15 16 17 17

3.input:[23 ,20, 17 , 16 ,14]

3.output:17 16 14

4.input:[-2,-1,0,1,2,4]

4.output:-2 -1 0 1

For input number 4 there are two possible answers

  • -2 -1 0 1
  • -1 0 1 2 But my solution take first as if subset's length is same then it will print the subset which occurs first in array when we traverse array elements from position 0 to array length-1

But if we have to find largest subset in array which may or may not be consecutive then solution would be different.

1
votes

This seems very similar to your "naïve" approach, but it's O(n) excluding the sort so I don't think you can improve on your approach much. The optimization is to use indices and only create a second array once the answer is known:

def largest_less_than_k_apart(a, k):
    a.sort()
    upper_index = lower_index = max_length = max_upper_index = max_lower_index = 0
    while upper_index < len(a):
        while a[lower_index] < a[upper_index] - k:
            lower_index += 1
        if upper_index - lower_index + 1 > max_length:
            max_length = upper_index - lower_index + 1
            max_upper_index, max_lower_index = upper_index, lower_index
        upper_index += 1
    return a[max_lower_index:max_upper_index + 1]

a = [14,15,17,20,23]
print largest_less_than_k_apart(a, 3);

Output:

[14, 15, 17]

It does one pass through the sorted array, with the current index stored in upper_index and another index lower_index that lags behind as far as possible while still pointing to a value greater than or equal to K less than the value of the current element. The function keeps track of when the two indices are as far apart as possible and uses those indices to split the list and return the subset.

Duplicate elements are handled, because lower_index lags behind as far as possible (pointing to the earliest duplicate), whereas the difference of indices will be maximal when upper_index is pointing to the last duplicate of a given subset.

It's not valid to pass in a negative value for k.

-1
votes

Brute force approach:

arr = [14,14,14,15,16,17,17]
max_difference = 3
solution = []

for i, start in enumerate(arr):
    tmp = []
    largest = start
    smallest = start
    for j, end in enumerate(arr[i:]):
        if abs(end - largest) <= max_difference and abs(end - smallest) <= max_difference:
            tmp.append(end)
            if end > largest:
                largest = end
            if end < smallest:
                smallest = end
        else:
            break
    if len(tmp) > len(solution):
        solution = tmp

Try to optimize it! (Tip: the inner loop doesn't need to run as many times as it does here)

-1
votes

An inefficient algorithm (O(n^2)) for this would be very simple:

l = [14,15,17,20,23]
s = max((list(filter(lambda x: start<=x<=start+3, l)) for start in l), key=len)
print(s)
-1
votes

A speedy approach with complexity O(n*log(n)) for the sort and O(n) to search for the longest chain:

list_1 = [14, 15, 17, 20, 23]

k = 3

list_1.sort()
list_len = len(list_1)

min_idx = -1
max_idx = -1
idx1 = 0
idx2 = 0

while idx2 < list_len-1:
    idx2 += 1
    while list_1[idx2] - list_1[idx1] > k:
        idx1 += 1
    if idx2 - idx1 > max_idx - min_idx:
        min_idx, max_idx = idx1, idx2

print(list_1[min_idx:max_idx+1])