1
votes

I have a string S (with words indexed from 0) and a substring Q. I wish to find the smallest range [L, R] in S which contains all words in Q. There are no duplicate words in Q. How do I approach this ?

For example,

Input: S: what about the lazy brown fox that jumped over the other brown one which lazy dog ate the food of the fox Q: lazy brown dog

Output: [11,15]

My code:

S = raw_input().strip().split(' ')
Q = raw_input().strip().split(' ')

count = [0 for x in range(len(Q))]
smallest_index = [0 for x in range(len(Q))]
largest_index = [0 for x in range(len(Q))]

for i in range(len(S)):
    for j in range(len(Q)):
        if S[i] == Q[j]:
            count[j] += 1
            if count[j] <= 1:
                smallest_index[j] = i
                largest_index[j] = i
            if count[j] > 1:
                largest_index[j] = i

largest_index.sort()
print "[%d," % largest_index[0],
print "%d]" % largest_index[len(Q)-1]
3
show the code which you tried ?sachin dubey
Why the downvote ?pikaraider
It's possible you got a downvote because you posted a broad question with no code. Your question is much better now, but you need to explain what's wrong with the code you posted.PM 2Ring
My issue is with the time complexity. I am running two loops. What if S and Q are of large magnitude ? How can I make it more efficient.pikaraider
Fair enough. Why are you collecting smallest_index values but never using them? But apart from that I don't think that your algorithm will always find the correct minimal solution. The minimal range could be one that uses the largest indices, the smallest indices, or somewhere in the middle.PM 2Ring

3 Answers

2
votes

This code isn't particular efficient, but it does work correctly. Perhaps someone will devise a better way of processing the position information than using product. In the mean time you can use this code to test other algorithms against.

from itertools import product

def words_range(src, query):
    # Create a dict to store the word positions in src of each query word
    pos = {s: [] for s in query}
    for i, s in enumerate(src):
        if s in pos:
            pos[s].append(i)
    print(pos)

    # Find all the ranges that hold all the query word 
    ranges = ((min(t), max(t)) for t in product(*pos.values()))
    # Find the smallest range
    return min(ranges, key=lambda t:t[1] - t[0])

# Test

src = '''what about the lazy brown fox that jumped over the other
brown one which lazy dog ate the food of the fox'''.split()
for i, s in enumerate(src):
    print(i, s)

query = 'lazy brown dog'.split()
print(words_range(src, query))

query = 'the lazy brown fox'.split()
print(words_range(src, query))

output

0 what
1 about
2 the
3 lazy
4 brown
5 fox
6 that
7 jumped
8 over
9 the
10 other
11 brown
12 one
13 which
14 lazy
15 dog
16 ate
17 the
18 food
19 of
20 the
21 fox
{'lazy': [3, 14], 'brown': [4, 11], 'dog': [15]}
(11, 15)
{'the': [2, 9, 17, 20], 'lazy': [3, 14], 'brown': [4, 11], 'fox': [5, 21]}
(2, 5)
2
votes

This is a slightly more efficient version of PM 2Ring's solution, replacing the call to product with a loop:

from itertools import product

def words_range(src, query):
    query = set(query)

    # Create a dict to store the word positions in src of each query word
    pos = {s: [] for s in query}
    for i, s in enumerate(src):
        if s in pos:
            pos[s].append(i)

    # Find all the ranges that hold all the query word 
    # We'll iterate over the input string and keep track of
    # where each word appeared last
    last_pos = {}
    ranges = []
    for i, word in enumerate(src):
        if word in query:
            last_pos[word] = i
            if len(last_pos) == len(query):
                ranges.append( (min(last_pos.values()), i) )

    # Find the smallest range
    return min(ranges, key=lambda t:t[1] - t[0])

It's not quite linear time (because of the min(last_pos.values()) in the loop), but it's a step in the right direction. There's probably a way to get rid of the min call (that I can't think of right now), which would make this linear.

0
votes

Here is another approach based on @PM 2Ring answer:

S ='what about the lazy brown fox that jumped over the other brown one which lazy dog ate the food of the fox'
Q ='lazy brown dog'

import itertools
track={}

for index,value in enumerate(S.split()):

    if value in Q:
        if value not in track:
            track[value]=[index]
        else:
            track[value].append(index)


combination = [(min(item),max(item)) for item in itertools.product(*track.values())]


result=min([(i[1]-i[0],(i[0],i[1])) for i in combination if set(Q.split()).issubset(S.split()[i[0]:i[1]+1])])
print(result[1])

output:

(11, 15)