JCR basic concepts

Question

I have been recently working with Magnolia CMS which happens to use JCR.

One of the problems I have face is JCR data corruption and I found that I had very little knowledge of how to troubleshoot the situation.

My understanding of JCR is as follows:

JCR is a specification, there are several implementations
Jackrabbit is one JCR implementation
Jackrabbit may store the information using the file system directly or using a database like MySQL

Now my questions are

How can a JCR repository be backed up and restored?
Is there any particular tool that can be used to check integrity of a given JCR and try to fix it? I have been playing a little bit with toromiro.
Is there any particular resource of information/tutorial that I should read to gain full and proper understanding of the JCR technology?

Update:

I have some other questions:

If a given JCR implementation stores the content on a database, can I expect ALL the content to be stored at that database or could it happen that some content (ie images), would be stored directly on the file system rather than in the database?
Currently we have a JCR repo which is accessed by three different webservers, it is my understanding that the JCR spec considers this situation and that it protects the repo in order to prevent inconsistency on the content due to concurrent write access. Is this correct?
To be specific, the problem we experienced consisted on having a node A containing a reference to node B, but node B being not accessible, after using a groovy script, we managed to delete node B (which seemed to be in an inconsistent state), however, how could we find all the references to node B (maybe not only node A referenced it, but also node C). What the hell could have caused the JCR repo to became corrupt?, btw we also tried to use the forceConsistencyCheck, autorepair and enableConsistencyCheck flags, it did not fix the problem.

Thanks

Randall Hauch Randall Hauch · Accepted Answer · 2014-02-18T16:55:46

Your understanding of JCR is correct: it is a specification that has been implemented by multiple projects (including Jackrabbit, ModeShape, Alfresco, eXo, etc.). In fact, there are multiple versions of JCR (1.0, 2.0 and very soon 2.1), and not all implementations support all JCR versions.

(Full disclosure: I'm the founder and lead of ModeShape.)

There is no standard nor universal way to back up a JCR repository, but several of the implementations offer their own mechanisms. For example, if everything is stored in a DBMS, then you can use the DBMS backup and restore feature. Jackrabbit has its own backup mechanism, as does ModeShape.

What kind of integrity checking are you doing, and how does Toromino do that? JCR implementations should not allow any content to be saved that would violate the defined constraints (e.g., node type definitions with property and child node definitions), and they limit (to various degrees) how these node definitions can be changed.

I'm not aware of any great JCR books or online resources, but have a look at the Jackrabbit docs and the ModeShape docs.

JCR basic concepts

1 Answers