0
votes

I've got a scenario where I need to query the datastore for some random users who have been active in the last X minutes.

Each of my User entities have a property called 'random'. When I want to find some random users I generate a random min and max value and use them to query the datastore against the users random property.

This is what I've got so far:

public static List<Entity> getRandomUsers(Key filterKey, String gender, String language, int maxResults) {
    ArrayList<Entity> nonDuplicateEntities = new ArrayList<>();

    HashSet<Entity> hashSet = new HashSet<>();
    int attempts = 0;
    while (nonDuplicateEntities.size() < maxResults) {
        attempts++;
        if (attempts >= 10) {
            return nonDuplicateEntities;
        }

        double ran1 = Math.random();
        double ran2 = Math.random();

        Filter randomMinFilter = new Query.FilterPredicate(Constants.KEY_RANDOM, Query.FilterOperator.GREATER_THAN_OR_EQUAL, Math.min(ran1, ran2));
        Filter randomMaxFilter = new Query.FilterPredicate(Constants.KEY_RANDOM, Query.FilterOperator.LESS_THAN_OR_EQUAL, Math.max(ran1, ran2));
        Filter languageFilter = new Query.FilterPredicate(Constants.KEY_LANGUAGE, Query.FilterOperator.EQUAL, language);

        Filter randomRangeFilter;
        if (gender == null || gender.equals(Constants.GENDER_ANY)) {
            randomRangeFilter = Query.CompositeFilterOperator.and(randomMinFilter, randomMaxFilter, languageFilter);
        } else {
            Filter genderFilter = new Query.FilterPredicate(Constants.KEY_GENDER, Query.FilterOperator.EQUAL, gender);
            randomRangeFilter = Query.CompositeFilterOperator.and(randomMinFilter, randomMaxFilter, genderFilter, languageFilter);
        }

        Query q = new Query(Constants.KEY_USER_CLASS).setFilter(randomRangeFilter);

        PreparedQuery pq = DatastoreServiceFactory.getDatastoreService().prepare(q);

        List<Entity> entities = pq.asList(FetchOptions.Builder.withLimit(maxResults - nonDuplicateEntities.size()));
        for (Entity entity : entities) {
            if (filterKey.equals(entity.getKey())) {
                continue;
            }
            if (hashSet.add(entity)) {
                nonDuplicateEntities.add(entity);
            }
            if (nonDuplicateEntities.size() == maxResults) {
                return nonDuplicateEntities;
            }
        }
    }

    return nonDuplicateEntities;
}

I now need just users who have been active recently.

Each of the User entities also have a 'last active' property, which I want to include in the query e.g. last active > 30 minutes ago.

This would mean having an inequality filter on two properties, which I can't do.

What is the most efficient way to do this?

I could get all user entities active in the last X minutes, and then pick some random ones. I could leave my code as is and do a check for last active before adding them to the non duplicate entity list, but this might involve lots of calls to the datastore.

Is there some other way I can do this just using the query?

1
How fixed is X in your random minutes, and over what period are you interested, a few mintues, 10 minutes 50 minutes ? - Tim Hoffman
@TimHoffman The minutes value won't be random, it will be a constant value anywhere between 10 and 30 - Simon
Once you have a list of n users recently active you could randomly pick some out of the list for display in-memory. Storing random values in a database ... let's just say that it may not be the best of all ideas. Indexes on random values and sorting by those is just waste of money imho. - konqi
Though a keys only query for users with recent activity , say last 50, and then randomly choose a key and then get that single entity wouldn't be too expensive. In addition you could cache the keys in memcache for a short periods and reuse for a random pick until the cache expires. - Tim Hoffman
@Simon please add the link to the answer you're refering to so i can scold somebody ;-). Tim's last comment is probably the best solution. Also note that random values in the database will not give you the random result you would expect. The users would always be sorted in the same order, so if there was a user with a very low random value he'd always dominate your recent activities (or whatever) list. Tim: May I suggest you make an answer out of your comment? - konqi

1 Answers

3
votes

Given the above comments as requested here is one approach.

With the assumption you have a "last active" property which stores a date time stamp you can then perform a keys only query where the last active datetime_stamp > "a datetime stamp of interest".

On retrieving the keys perform a random choice on the result set, then explicitly fetch the key with a get operation. This will limit costs to small ops and a get.

I would consider then caching this set of keys in memcache, with a defined expiry period, so you can re-use the set of keys if you need another random choice in the next nominated period rather than re-querying, 2 secs later. Accuracy doesn't appear to be too important given the random choice.

If you do adopt the caching strategy, you do have to deal with cache expiry and refreshing the cache.

A potential issue here is running into the dogpile effect, where multiple requests all fail to retrieve the cache at the same time and each handler starts building the cache. In a lightly loaded system this may not be an issue, in a heavily loaded system with a lot of activity, you may want to keep the cache active with a task. - Just something to think about.