Smallest range of substring in string

Question

I have a string S (with words indexed from 0) and a substring Q. I wish to find the smallest range [L, R] in S which contains all words in Q. There are no duplicate words in Q. How do I approach this ?

For example,

Input: S: what about the lazy brown fox that jumped over the other brown one which lazy dog ate the food of the fox Q: lazy brown dog

Output: [11,15]

My code:

S = raw_input().strip().split(' ')
Q = raw_input().strip().split(' ')

count = [0 for x in range(len(Q))]
smallest_index = [0 for x in range(len(Q))]
largest_index = [0 for x in range(len(Q))]

for i in range(len(S)):
    for j in range(len(Q)):
        if S[i] == Q[j]:
            count[j] += 1
            if count[j] <= 1:
                smallest_index[j] = i
                largest_index[j] = i
            if count[j] > 1:
                largest_index[j] = i

largest_index.sort()
print "[%d," % largest_index[0],
print "%d]" % largest_index[len(Q)-1]

It's possible you got a downvote because you posted a broad question with no code. Your question is much better now, but you need to explain what's wrong with the code you posted. — PM 2Ring
My issue is with the time complexity. I am running two loops. What if S and Q are of large magnitude ? How can I make it more efficient. — pikaraider
Fair enough. Why are you collecting smallest_index values but never using them? But apart from that I don't think that your algorithm will always find the correct minimal solution. The minimal range could be one that uses the largest indices, the smallest indices, or somewhere in the middle. — PM 2Ring

PM 2Ring PM 2Ring · Accepted Answer · 2017-11-13T09:15:29

This code isn't particular efficient, but it does work correctly. Perhaps someone will devise a better way of processing the position information than using product. In the mean time you can use this code to test other algorithms against.

from itertools import product

def words_range(src, query):
    # Create a dict to store the word positions in src of each query word
    pos = {s: [] for s in query}
    for i, s in enumerate(src):
        if s in pos:
            pos[s].append(i)
    print(pos)

    # Find all the ranges that hold all the query word 
    ranges = ((min(t), max(t)) for t in product(*pos.values()))
    # Find the smallest range
    return min(ranges, key=lambda t:t[1] - t[0])

# Test

src = '''what about the lazy brown fox that jumped over the other
brown one which lazy dog ate the food of the fox'''.split()
for i, s in enumerate(src):
    print(i, s)

query = 'lazy brown dog'.split()
print(words_range(src, query))

query = 'the lazy brown fox'.split()
print(words_range(src, query))

output

0 what
1 about
2 the
3 lazy
4 brown
5 fox
6 that
7 jumped
8 over
9 the
10 other
11 brown
12 one
13 which
14 lazy
15 dog
16 ate
17 the
18 food
19 of
20 the
21 fox
{'lazy': [3, 14], 'brown': [4, 11], 'dog': [15]}
(11, 15)
{'the': [2, 9, 17, 20], 'lazy': [3, 14], 'brown': [4, 11], 'fox': [5, 21]}
(2, 5)

Smallest range of substring in string

3 Answers