3
votes

I have the following problem:

  1. A leader server creates objects which have a start time and end time. The start time and end time are set when an object gets created.

  2. The Start time of the object is set to current time on the leader node, and end time is set to Start time + Delta Time

  3. A thread wakes up regularly and checks if the End time for any of the objects are lesser than the current time (hence the object has expired) - if yes then the object needs to be deleted

All this works fine, as long as things are running smoothly on leader node. If the leader node goes down, one of the follower node becomes the new leader. (There will be replication among leader and follower node (RAFT algorithm))

Now on the new leader, the time could be very different from the previous leader. Hence the computation in step 3 could be misleading.

One way to solve this problem, is to keep the clocks of nodes (leader and followers) in sync (as much as possible).

But I am wondering if there is any other way to resolve this problem of "expiry" with distributed nodes?

Further Information:

  1. Will be using RAFT protocol for message passing and state replication
  2. It will have known bounds on the delay of message between processes
  3. Leader and follower failures will be tolerated (as per RAFT protocol)
  4. Message loss is assumed not to occur (RAFT ensures this)
  5. The operation on objects is to check if they are alive. Objects will be enqueued by a client.
  6. There will be strong consistency among processes (RAFT provides this)
2
Does object expiration and deletion depend strictly on an absolute (synchronized) time?Jason
no, it need not. what is important is that some time passes between creation and expirationcoder_bro

2 Answers

1
votes

I've seen expiry done in two different ways. Both of these methods guarantee that time will not regress, as what can happen if synchrnozing clocks via NTP or otherwise using the system clock. In particular, both methods utilize the chip clock for strictly increasing time. (System.nanoTime in Javaland.)

These methods are only for expiry: time does not regress, but it is possible that time can go appear to go slower.

First Method

The first method works because you are using a raft cluster (or a similar protocol). It works by broadcasting an ever-increasing clock from the leader to the replicas.

Each peer maintains what we'll call the cluster clock that runs at near real time. The leader periodically broadcasts the clock value via raft.

When a peer receives this clock value it records it, along with the current chip clock value. When the peer is elected leader it can determine the duration since the last clock value by comparing its current chip clock with the last recorded chip clock value.

Bonus 1: Instead of having a new transition type, the cluster clock value may be attached to every transition, and during quiet periods the leader makes no-op transitions just to move the clock forward. If you can, attach these to the raft heartbeat mechanism.

Bonus 2: I maintain some systems where time is increased for every transition, even within the same time quantum. In other words, every transition has a unique timestamp. For this to work without moving time forward too quickly your clock mechanism must have a granularity that can cover your expected transition rate. Milliseconds only allow for 1,000 tps, microseconds allow for 1,000,000 tps, etc.

Second Method

Each peer merely records its chip clock when it receives each object and stores it along with each object. This guarantees that peers will never expire an object before the leader, because the leader records the time stamp and then sends the object over a network. This creates a strict happens-before relationship.

This second method is susceptible, however, to server restarts. Many chips and processing environments (e.g. the JVM) will reset the chip-clock to a random value on startup. The first method does not have this problem, but is more expensive.

0
votes

If you know your nodes are synchronized to some absolute time, within some epsilon, the easy solution is probably to just bake the epsilon into your garbage collection scheme. Normally with NTP, the epsilon is somewhere around 1ms. With a protocol like PTP, it would be well below 1ms.

Absolute time doesn't really exist in distributed systems though. It can be bottleneck to try to depend on it, since it implies that all the nodes need communicate. One way of avoiding it, and synchronization in general, is to keep a relative sequence of events using a vector clock, or an interval tree clock. This avoids the need to synchronize on absolute time as state. Since the sequences describe related events, the implication is that only nodes with related events need to communicate.

So, with garbage collection, objects could be marked stale using node sequence numbers. Then, instead of the garbage collector thread checking liveness, the object could either be collected as the sequence number increments, or just marked stale and collected asynchronously.