I am trying to write an actor implementation in Java. My design requires a high performance map data structure to be used for looking up what thread a particular actor is scheduled on. The look up is done with an int id. All actors have separate ids. I have following requirements:
i) Keys are primitive ints, not Integer classes.
ii) Values are also primitives. Values can only cover a few numbers which are known before instantiation of the data structure. A value is just an id of a thread/core so it could be a short. Number of threads is less than the number of cores on the machine so it can't really reach a very high number.
iii)The map is written to by a single thread, but is read from multiple ones. I want my implementation to be lock-free and without any sharing (false or otherwise). Thus reads should not involve any writes to non thread-local memory.
iv) Number of writes (by the single thread) will be vastly outnumbered by the reads from multiple reader threads which use the map for look ups.
v) The primary operations that are needed:
Set(key, value)anddelete(key, value)which are always called from the single writer thread. Most keys that are set are also deleted eventually, so performance after a lot of deletions cannot degrade. New keys (actor-ids) will be generated using an incrementing integer and signify creation of an actor. Deletion of a key(id of an actor) signifies that said actor has exited and will never revive. What this also means is that a key once deleted will never be set again. It's important that we don't accumulate dead space in the map since deletions will occur (actors exit).Get(key)called from reader thread.
vi) The operation get(key) needs to be eventually consistent but with some caveats. Say the writer thread has changed the pair key1->value1 to key1->value2. It is not a problem if one of the readers performs get(key1) and still receives value1. Eventually it should get value2. It's also fine if the pair key1->value1 has been deleted by the writer thread, and a get(key1) on the reader thread still returns value1. In practice what I mean is that something like Java's putOrderedObject/lazySet/getObjectVolatile , or C++11's std::memory_order_relaxed/std::memory_order_acquire/std::memory_order_release could be incorporated. On the other hand get(key1) should not return an empty value (say -1) if a value is indeed set. I don't mind having a getStrict(key1) operation which I can call if get(key1) returns an empty value to meet this requirement.
Reasons I am not straight away using a library are:
i) Java collections: They require wrapper classes, thus not meeting my goals (i) and (ii)
ii) Trove, FastUtil etc: They do have primitive maps, but don't provide for any concurrent access facilities. They also do not optimize for the values being in a sparse range - number of cores in my case.
iii) Java ConcurrentHashMap/ConcurrentSkipListMap: They require wrapper classes and do not optimize for the single writer, multiple reader use case.
I realize that these are a lot of requirements so it's fine if the answers address some of the points while remaining ambiguous about some others. Pointing me to source/code or comments on a design would be great. Any explanation of trade-offs would be an added bonus since I am trying to learn how to fish.
If I boil my detailed requirements to a few questions they probably are:
i) How can I optimize for the single-writer/multiple reader use case?
ii) How do I design the get(key) and getStrict(key) operations? Is this the right way of even thinking about it?
iii) How can I design my map to take advantage of the incrementing keys and sparse value range?
iv) How do I optimally handle the frequent deletions? Any resizing/rehashing will need to be immediately visible to the reader threads instead of being eventually visible.
Also any answers with C++/C++11 code are welcome. With some research, I should be able to convert most of the std::atomic operations into Java Unsafe ones.
std::shared_lock(C++14)? "High performance" doesn't necessarily correlate with "lock-free". Lock-freedom is about deterministic latency rather than throughput. - Kerrek SB