I have a legacy project which uses Apache Jackrabbit (JCR) version 2.0 as main storage (a little bit outdated but I can't change it for now).
I have to clean storage for unused nodes and versions, so I'm iterating all the storage tree, testing for each node/version if it should be deleted or not.
I have a javax.jcr.Session
object.
The remove API is invoked in a for loop by:
VersionManager vm = session.getWorkspace().getVersionManager();
Node root = session.getRootNode();
NodeIterator nodeIterator = root.getNodes();
for(int currentNode = 0; currentNode < nodeIterator.getSize(); currentNode ++) {
Node node = nodeIterator.nextNode();
VersionHistory versionHistory = vm.getVersionHistory(node.getPath());
VersionIterator versionIterator = versionHistory.getAllVersions();
for(int currentVersion = 0; currentVersion < versionIterator.getSize(); currentVersion ++) {
Version version = versionIterator.nextVersion();
if(shouldDelete(node, version)) {
versionHistory.removeVersion(version.getName());
}
}
}
The problem is that the removeVersion
API is very slow.
The first question is if there is some any other way to go faster, considering that when I perform the cleaning there is only one thread working on the storage.
I've explored the javadoc and I figured out that there should be a procedure to perform batch operations, which is my case. For example:
VersionManager vm = session.getWorkspace().getVersionManager();
RepositoryService rs = getRepositoryService(session);
SessionInfo si = getSessionInfo(session);
ItemId mainId = null;
Batch batch = rs.createBatch(si, mainId);
Node root = session.getRootNode();
NodeIterator nodeIterator = root.getNodes();
for(int currentNode = 0; currentNode < nodeIterator.getSize(); currentNode ++) {
Node node = nodeIterator.nextNode();
VersionHistory versionHistory = vm.getVersionHistory(node.getPath());
VersionIterator versionIterator = versionHistory.getAllVersions();
for(int currentVersion = 0; currentVersion < versionIterator.getSize(); currentVersion ++) {
Version version = versionIterator.nextVersion();
if(shouldDelete(node, version)) {
ItemId id = getItemId(node, version);
batch.remove(id);
//versionHistory.removeVersion(version.getName());
}
}
}
// TODO: how to execute batch?
I have some question on this batch api:
- how can I get a
RepositoryService
from mySession
? - how can I get a
SessionInfo
from mySession
? - which is the meaning of the
ItemId
passed when aBatch
object is created? What kind of value should I pass? - how can I get an
ItemId
from a node and its version? - once I've build my
Batch
object with all its remove, how can I execute it over my session?