A Web Server can receive millions of user login request. A user can login multiple times. Design an optimal algorithm/data-structure in terms of time complexity to return total number of unique users between given time intervals.
For e.g : Count total number of unique users between interval t1 & t2 and t2 & t3. Also think about returning total count for overlapping intervals (t1 = 10am, t2 = 10:15am, t3 = 10:30am, return total number of users between 10:10am to 10:20am)
Below is what I am proposing, Would appreciate people's comments?
IMO combination of Hashmap & min heap would be good as an optimal solution.
Hashmap- will have key as user-id and value as node pointer to corresponding node in min heap. Min heap - last-logged-in time as key and value as the user-id. Root will be the user whose log-in time is the oldest. Also at the root store the count of total number of nodes in the min-heap so we can quickly return the count.
When a user logs-in. Lookup in the hashmap with user-id as key. a) If no match then insert the new user into hashmap & a new node for user into the min heap and increment the count stored with the root node of min heap. b) else it's an old user update it's last-logged-in value and do not increment the count in root node of min heap.
Whenever we want to find out the unique users logged between t2-t1 then a. Extract min(root) from the heap and check if current time - last-logged-in time > t2-t1 mins. If it is greater than delete the value from the hashmap & min_heap. b. Repeat the above step (a) until the min element of the heap satisfies current time - last-logged-in time <= t2-t1 mins c. Return the value of the count from root node of the min heap.
But I am not able to nail down the algorithm for overlapping intervals.