0
votes

I have a legacy project which uses Apache Jackrabbit (JCR) version 2.0 as main storage (a little bit outdated but I can't change it for now).

I have to clean storage for unused nodes and versions, so I'm iterating all the storage tree, testing for each node/version if it should be deleted or not.

I have a javax.jcr.Session object.

The remove API is invoked in a for loop by:

VersionManager vm = session.getWorkspace().getVersionManager();

Node root = session.getRootNode();
NodeIterator nodeIterator = root.getNodes();

for(int currentNode = 0; currentNode < nodeIterator.getSize(); currentNode ++) {

   Node node = nodeIterator.nextNode();

   VersionHistory versionHistory = vm.getVersionHistory(node.getPath());
   VersionIterator versionIterator = versionHistory.getAllVersions();
   for(int currentVersion = 0; currentVersion < versionIterator.getSize(); currentVersion ++) {
      Version version = versionIterator.nextVersion();

      if(shouldDelete(node, version)) {
         versionHistory.removeVersion(version.getName());
      }
   }
}

The problem is that the removeVersion API is very slow.

The first question is if there is some any other way to go faster, considering that when I perform the cleaning there is only one thread working on the storage.

I've explored the javadoc and I figured out that there should be a procedure to perform batch operations, which is my case. For example:

VersionManager vm = session.getWorkspace().getVersionManager();

RepositoryService rs = getRepositoryService(session);
SessionInfo si = getSessionInfo(session);
ItemId mainId = null;
Batch batch = rs.createBatch(si, mainId);

Node root = session.getRootNode();
NodeIterator nodeIterator = root.getNodes();

for(int currentNode = 0; currentNode < nodeIterator.getSize(); currentNode ++) {

   Node node = nodeIterator.nextNode();

   VersionHistory versionHistory = vm.getVersionHistory(node.getPath());
   VersionIterator versionIterator = versionHistory.getAllVersions();
   for(int currentVersion = 0; currentVersion < versionIterator.getSize(); currentVersion ++) {
      Version version = versionIterator.nextVersion();

      if(shouldDelete(node, version)) {
         ItemId id = getItemId(node, version);
         batch.remove(id);

         //versionHistory.removeVersion(version.getName());
      }
   }
}

// TODO: how to execute batch?

I have some question on this batch api:

  • how can I get a RepositoryService from my Session?
  • how can I get a SessionInfo from my Session?
  • which is the meaning of the ItemId passed when a Batch object is created? What kind of value should I pass?
  • how can I get an ItemId from a node and its version?
  • once I've build my Batch object with all its remove, how can I execute it over my session?
1

1 Answers

1
votes

AFAIU, Batch and SessionInfo are interfaces in Jackrabbit SPI, which is a layer below the JCR API, mainly used for remoting (like over WebDAV).

I don't think it'll help you here.