2
votes

I want to implement a Cache in Java that is supposed to cache tags for a given id. (0-N tags for one id) There are around 1000 unique tags in 100 million entities, but the actual number can vary by a few thousand. It does not not need to consider id/tag eviction.
It is expected for the cache to throw a OutOfMemoryError if more tags exist than we can cache in memory.
However, the design should ensure that it takes as little memory as possible to cache the tags.

The cache has one method 'getTags()' method that takes an id and returns the tags for the entity.

This method on worst case ( barring garbage collection) take a few 100 nano seconds. It can be called 1000's of time in a few milliseconds. The cache should be designed for multi threaded access with 1000's of requests to getTags in a few ms.

Please suggest a good data structure/Collection to use which can offer me such a performance.

2
Have a look at Guava caches - vanOekel
Please, describe how the cache is filled and/or updated. - leventov
@vanOekel Guava caches have eviction strategies, but poor performance and memory footprint. OP is asking for the opposite. - leventov

2 Answers

2
votes

For selecting a good cache, with good in-memory read performance, take a look at the benchmarks at the cache2k benchmark page. It compares EHCache, guava cache, cache2k, and Infinispan.

If you don't need eviction, why you need a cache then? Anyway, within cache2k it is possible to switch to eviction implementations that have very low overhead, like this:

Cache<String, String> c =
  CacheBuilder.newCache(String.class, String.class)
    .source(new CacheSource<String, String>() {
      @Override
      public String get(String o) {
        ... fill code ...
      }
    })
    .implementation(ClockCache.class)
    .build();

Another low overhead eviction is org.cache2k.impl.RandomCache, which just selects an eviction candidate by a round robin pointer that walks through the hashtable. The different algorithms are not exposed within the API module, so you need to have cache2k-core.jar within your compile scope.

Disclaimer: I work on cache2k...

0
votes

I have used ConcurrentHashMap in one of my projects and works well. I have not measured performance to the nano-second level, but suits my application.