1
votes

I'm looking for an API to synchronize two different JCR repositories.

  • The synchronization will be done frequently (e.g. each 1 hour).
  • Only specified subtrees must be synchronized.
  • A repository is master and another is slave repository.
  • The slave repository is read-onley and must be accessible in emergencies.

Is there any API to do such a synchronization operation?

Any suggestion is appreciated.

2

2 Answers

2
votes

I can think of several ways to do this using just the JCR API in any JCR implementation:

  1. Create and register an event listener on the master repository that monitors events that happen on the specific subtrees of interest, and then record these events in some persisted form (e.g., in a queue, file system, a third repository, etc... whatever works best in your environment). Then periodically process those recorded events and "replay" them by manipulating the nodes in the slave repository.
  2. Create and register an event listener on the master repository that monitors events that happen on the specific subtrees of interest, and then immediately connect to the slave repository and "replay" these events.
  3. Periodically connect to the master repository and use the journaling feature (if supported) to obtain what has changed in the master repository since the last time this was done, and then connect to the slave repository and "replay" those events that apply to the specific subtrees of interest.

Another option might be to make the master and slave repositories completely in-sync by clustering them. Jackrabbit and ModeShape can both do this, but they both do it completely differently as it is not defined in the JCR specification.

For example, with ModeShape (disclosure: I'm the project lead) you can create small clusters of just 2 processes or larger clusters with many processes. You can choose up front whether each process in the cluster has a complete copy of all of the content (i.e., "replicated" and "invalidation" modes) or just some of the content (i.e., "distributed" mode). See the documentation for details. These clusters can also span multiple sites, helping to increase fault tolerance. ModeShape is elastic, so you can simply add more processes to the cluster at any time, and you can even remove them. The best part is that client applications still just use the JCR API yet see the whole repository content just as they would a non-clustered repository.

1
votes

The (brand new and still unreleased) Apache Sling replication module does that out of the box. It requires running Sling on top of your repositories, but that's fairly lightweight and brings lots of useful functionality for JCR applications.