4
votes

My service endpoint recieves a list of metric every minute along with their timestamps. If a metric passes certain conditions we need to store them in a cache, so that they can be accessed later. The access functions for this service are -

List<Metrics> GetAllInterestingMetrics5Mins();
List<Metrics> GetAllInterestingMetrics10Mins();
List<Metrics> GetAllInterestingMetrics30Mins();

My curent solution is to use 3 Guava caches with time based eviction set to 5, 10 & 15 minutes. When somebody calls one of the above functions, I return all the metrics from the relvant cache.

There are 2 problems with this -

  1. Guava cache start timing for eviction based on when the value is put in the cache (or accessed, depending upon setting). Now its possible for a metric to be delayed, so the timestamp would be earlier than the time when the metric is put in the cache.
  2. I dont like that I have to create 3 caches, when one cache with 30 mins should suffice, it increases memory footprint and complexity in cache handling.

Is there a way to solve these 2 problems in Guava or any other out of box caching solution ?

3

3 Answers

2
votes

There is a particular difference between caching solutions like Guava and EHCache and what you are trying to implement. The sole purpose of these caches is to act in the same way than getter functions work. So, caches are intended to retrieve a single element by its key and store it for further use; evicting it after it stops being used.

E.g.

@Cacheable
public Object getter(String key){
...
}

That's why getting a whole set of objects from the cache feels a little like forcing the cache and the eviction policy to work differently from its original purpose.

What you need, instead of Guava cache (or other caching solutions), is a collection that can be evicted all at once by a timer function. Sadly, Guava doesn't provide that right now. You would still need a timer function provided by the application that would remove all existing elements from the cache.

So, my suggestion would be the following:

Even when it is possible for Guava to behave in the way you want it to, you will find out that you are not using the features that make Guava really valuable, and you are "forcing" it to behave differently. So I suggest you forget about the Guava implementation and consider using, for example, an specialization from the AbstractMap class, along with a timer function that will evict its contents every N seconds.

This way you will be able to have all your entries in a single cache and stop worrying about the discrepancies between the timestamp and the time the entry was added to the cache.

1
votes

Regarding Topic 1:

Just a sidenote: Please do not confuse expiry and eviction. Expiry means the entry may not returned by the cache any more and may happen at a specified point in time or after a duration. Eviction is the action to free resources, the entry is removed from the cache. After expiry, eviction may happen at the same time or later.

All common cache products do not have support for exact, aka "point in time", expiry. We need that usecase very often in our applications so I spent some effort with cache2k to support this.

Here is a blueprint for cache2k:

static class MetricsEntry {

  long nextUpdate;
  List<Metrics> metrics;

}

static class MyEntryExpiryCalculator implements EntryExpiryCalculator<Integer, MetricsEntry> {
  @Override
  public long calculateExpiryTime(Integer _key, MetricsEntry _value, long _fetchTime, CacheEntry _oldEntry) {
    return _value.nextUpdate;
  }
}

Cache createTheCache() {
  Cache<Integer, MetricsEntry> cache =
    CacheBuilder.newCache(Integer.class, MetricsEntry.class)
      .sharpExpiry(true)
      .entryExpiryCalculator(new MyEntryExpiryCalculator())
      .source(new MySource())
      .build();
   return cache;
}

If you have a time reference in the metrics objects, you can use that and you can omit the additional entry class. sharpExpiry(true) instructs cache2k for exact expiry. If you leave this out, the expiry may be a few milliseconds off, but the access time is slightly faster.

Regarding Topic 2:

The straight forward approach would be to use the interval minutes as cache key.

Here is a cache source (aka cache loader) that strictly returns the metrics of the previous interval:

static class MySource implements CacheSource<Integer, MetricsEntry> {
  @Override
  public MetricsEntry get(Integer interval)  {
    MetricsEntry e = new MetricsEntry();
    boolean crossedIntervalEnd;
    do {
      long now = System.currentTimeMillis();
      long intervalMillis = interval * 1000 * 60;
      long startOfInterval = now % (intervalMillis);
      e.metrics = calculateMetrics(startOfInterval, interval);
      e.nextUpdate = startOfInterval + intervalMillis;
      now = System.currentTimeMillis();
      crossedIntervalEnd = now >= e.nextUpdate;
    } while (crossedIntervalEnd);
    return e;
  }
}

That would return the metrics for 10:00-10:05 if you do the request on lets say 10:07.

If you just want to calculate instantly the metrics of the past interval, then it is simpler:

static class MySource implements CacheSource<Integer, MetricsEntry> {
  @Override
  public MetricsEntry get(Integer interval)  {
    MetricsEntry e = new MetricsEntry();
    long intervalMillis = interval * 1000 * 60;
    long startOfInterval = System.currentTimeMillis();
    e.metrics = calculateMetrics(startOfInterval, interval);
    e.nextUpdate = startOfInterval + intervalMillis;
    return e;
  }
}

The use of the cache source has an advantage over put(). cache2k is blocking, so if multiple requests come in for one metric, only one metric calculation is started.

If you don't need exact expiry to the millisecond, you can use other caches, too. The thing you need to do is to store the time it takes to calculate the metrics within your cache value and then correct the expiry duration accordingly.

Have a good one!

1
votes

Have you considered using something like a Deque instead? Just put the metrics in the queue and when you want to retrieve metrics for the last N minutes, just start at the end with the most recent additions and take everything until you find one that's from > N minutes ago. You can evict entries that are too old from the other end in a similar way. (It's not clear to me from your question how the key/value aspect of Cache relates to your problem.)