I am about to get started with clustering a jackrabbit repository run by hippocms in the community version. I got it up and running but some parts of configuration I don't understand.
I understood the concept of clustering jackrabbit this way: You have e.g. two instances with two local repositories which get synched by a rocket-scienced journal via a shared database, but every node is using it's local repository.
After reading the following pages I ended up with the following configuration.
Links:
- http://wiki.apache.org/jackrabbit/Clustering
- http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit-core/src/test/resources/org/apache/jackrabbit/core/cluster/repository-h2.xml?view=co&content-type=text/plain
- http://greymeister.net/blog/2011/11/28/jackrabbit-clustering-primer.html
Info: sharedRepositoryDS points to a shared database repositoryDS points to the local database (on each node)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Repository PUBLIC
"-//The Apache Software Foundation//DTD Jackrabbit 1.5//EN"
"http://jackrabbit.apache.org/dtd/repository-1.5.dtd">
<Repository>
<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
<param name="driver" value="javax.naming.InitialContext"/>
<param name="schemaObjectPrefix" value="repository_"/>
<param name="schema" value="mysql"/>
</FileSystem>
<Security appName="Jackrabbit">
<SecurityManager
class="org.hippoecm.repository.security.SecurityManager"/>
<AccessManager
class="org.hippoecm.repository.security.HippoAccessManager"/>
<LoginModule
class="org.hippoecm.repository.security.HippoLoginModule"/>
</Security>
<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default"/>
<Workspace name="${wsp.name}">
<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="url" value="java:comp/env/jdbc/repositoryDS"/>
<param name="driver" value="javax.naming.InitialContext"/>
<param name="schemaObjectPrefix" value="${wsp.name}_"/>
<param name="schema" value="mysql"/>
</FileSystem>
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
<param name="driver" value="javax.naming.InitialContext"/>
<param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
<param name="schemaObjectPrefix" value="${wsp.name}_"/>
<param name="externalBLOBs" value="true"/>
<param name="consistencyCheck" value="false"/>
<param name="consistencyFix" value="false"/>
</PersistenceManager>
<SearchIndex class="org.hippoecm.repository.FacetedNavigationEngineImpl">
<param name="indexingConfiguration" value="indexing_configuration.xml"/>
<param name="indexingConfigurationClass"
value="org.hippoecm.repository.query.lucene.ServicingIndexingConfigurationImpl"/>
<param name="path" value="${wsp.home}/index"/>
<param name="useCompoundFile" value="true"/>
<param name="minMergeDocs" value="1000"/>
<param name="volatileIdleTime" value="10"/>
<param name="maxMergeDocs" value="1000000000"/>
<param name="mergeFactor" value="5"/>
<param name="maxFieldLength" value="10000"/>
<param name="bufferSize" value="1000"/>
<param name="cacheSize" value="100000"/>
<param name="enableConsistencyCheck" value="true"/>
<param name="autoRepair" value="true"/>
<param name="analyzer"
value="org.hippoecm.repository.query.lucene.StandardHippoAnalyzer"/>
<param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl"/>
<param name="respectDocumentOrder" value="false"/>
<param name="resultFetchSize" value="100"/>
<param name="extractorPoolSize" value="0"/>
<param name="extractorTimeout" value="100"/>
<param name="extractorBackLogSize" value="100"/>
<param name="excerptProviderClass"
value="org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt"/>
<!-- supportHighlighting value is ignored, see REPO-711 -->
<param name="supportHighlighting" value="false"/>
<param name="supportSimilarityOnStrings" value="true"/>
<param name="supportSimilarityOnBinaries" value="false"/>
<param name="slowAlwaysExactSizedQueryResult" value="false"/>
<param name="onWorkspaceInconsistency" value="log"/>
<!-- optional cache parameters for faceted engine. The default size
when not configured is 1000 for both parameters -->
<!-- param name="docIdSetCacheSize" value="1000"/>
<param name="facetValueCountMapCacheSize" value="1000"/-->
</SearchIndex>
<ISMLocking
class="org.apache.jackrabbit.core.state.FineGrainedISMLocking"/>
</Workspace>
<Versioning rootPath="${rep.home}/version">
<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
<param name="url" value="java:comp/env/jdbc/repositoryDS"/>
<param name="driver" value="javax.naming.InitialContext"/>
<param name="schemaObjectPrefix" value="version_"/>
<param name="schema" value="mysql"/>
</FileSystem>
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
<param name="driver" value="javax.naming.InitialContext"/>
<param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
<param name="schemaObjectPrefix" value="version_"/>
<param name="externalBLOBs" value="true"/>
<param name="consistencyCheck" value="false"/>
<param name="consistencyFix" value="false"/>
</PersistenceManager>
<ISMLocking
class="org.apache.jackrabbit.core.state.FineGrainedISMLocking"/>
</Versioning>
<DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
<param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
<param name="driver" value="javax.naming.InitialContext"/>
<param name="databaseType" value="mysql"/>
<param name="minRecordLength" value="1024"/>
<param name="maxConnections" value="5"/>
<param name="copyWhenReading" value="true"/>
</DataStore>
<Cluster id="node1" syncDelay="2000">
<Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
<param name="revision" value="${rep.home}/revision.log" />
<param name="driver" value="javax.naming.InitialContext"/>
<param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
<param name="databaseType" value="mysql"/>
<param name="schemaObjectPrefix" value="journal_"/>
</Journal>
</Cluster>
</Repository>
Questions:
- Is the configuration correct?
- What is the repository.FileSystem ?
- Whats the difference to the repository.Workspace.FileSystem ?
- The PersistenceManager is responsible for writing the data, but why should it write into the shared database? (I want to get rid of this bottleneck right?)
Database Tables This is what the local node database looks like (for my taste a little too little):
This is what the shared database looks like: