2
votes

I am about to get started with clustering a jackrabbit repository run by hippocms in the community version. I got it up and running but some parts of configuration I don't understand.

I understood the concept of clustering jackrabbit this way: You have e.g. two instances with two local repositories which get synched by a rocket-scienced journal via a shared database, but every node is using it's local repository.

After reading the following pages I ended up with the following configuration.

Links:

Info: sharedRepositoryDS points to a shared database repositoryDS points to the local database (on each node)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Repository PUBLIC
        "-//The Apache Software Foundation//DTD Jackrabbit 1.5//EN"
        "http://jackrabbit.apache.org/dtd/repository-1.5.dtd">

<Repository>

    <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
        <param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="schemaObjectPrefix" value="repository_"/>
        <param name="schema" value="mysql"/>
    </FileSystem>

    <Security appName="Jackrabbit">
        <SecurityManager
                class="org.hippoecm.repository.security.SecurityManager"/>
        <AccessManager
                class="org.hippoecm.repository.security.HippoAccessManager"/>
        <LoginModule
                class="org.hippoecm.repository.security.HippoLoginModule"/>
    </Security>

    <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default"/>

    <Workspace name="${wsp.name}">
        <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
            <param name="url" value="java:comp/env/jdbc/repositoryDS"/>
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="schemaObjectPrefix" value="${wsp.name}_"/>
            <param name="schema" value="mysql"/>
        </FileSystem>

        <PersistenceManager
                class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
            <param name="schemaObjectPrefix" value="${wsp.name}_"/>
            <param name="externalBLOBs" value="true"/>
            <param name="consistencyCheck" value="false"/>
            <param name="consistencyFix" value="false"/>
        </PersistenceManager>

        <SearchIndex class="org.hippoecm.repository.FacetedNavigationEngineImpl">
            <param name="indexingConfiguration" value="indexing_configuration.xml"/>
            <param name="indexingConfigurationClass"
                   value="org.hippoecm.repository.query.lucene.ServicingIndexingConfigurationImpl"/>
            <param name="path" value="${wsp.home}/index"/>
            <param name="useCompoundFile" value="true"/>
            <param name="minMergeDocs" value="1000"/>
            <param name="volatileIdleTime" value="10"/>
            <param name="maxMergeDocs" value="1000000000"/>
            <param name="mergeFactor" value="5"/>
            <param name="maxFieldLength" value="10000"/>
            <param name="bufferSize" value="1000"/>
            <param name="cacheSize" value="100000"/>
            <param name="enableConsistencyCheck" value="true"/>
            <param name="autoRepair" value="true"/>
            <param name="analyzer"
                   value="org.hippoecm.repository.query.lucene.StandardHippoAnalyzer"/>
            <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl"/>
            <param name="respectDocumentOrder" value="false"/>
            <param name="resultFetchSize" value="100"/>
            <param name="extractorPoolSize" value="0"/>
            <param name="extractorTimeout" value="100"/>
            <param name="extractorBackLogSize" value="100"/>
            <param name="excerptProviderClass"
                   value="org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt"/>
            <!-- supportHighlighting value is ignored, see REPO-711 -->
            <param name="supportHighlighting" value="false"/>

            <param name="supportSimilarityOnStrings" value="true"/>
            <param name="supportSimilarityOnBinaries" value="false"/>
            <param name="slowAlwaysExactSizedQueryResult" value="false"/>

            <param name="onWorkspaceInconsistency" value="log"/>
            <!-- optional cache parameters for faceted engine. The default size
            when not configured is 1000 for both parameters -->
            <!-- param name="docIdSetCacheSize" value="1000"/>
            <param name="facetValueCountMapCacheSize" value="1000"/-->
        </SearchIndex>

        <ISMLocking
                class="org.apache.jackrabbit.core.state.FineGrainedISMLocking"/>
    </Workspace>

    <Versioning rootPath="${rep.home}/version">
        <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
            <param name="url" value="java:comp/env/jdbc/repositoryDS"/>
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="schemaObjectPrefix" value="version_"/>
            <param name="schema" value="mysql"/>
        </FileSystem>

        <PersistenceManager
                class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
            <param name="schemaObjectPrefix" value="version_"/>
            <param name="externalBLOBs" value="true"/>
            <param name="consistencyCheck" value="false"/>
            <param name="consistencyFix" value="false"/>
        </PersistenceManager>
        <ISMLocking
                class="org.apache.jackrabbit.core.state.FineGrainedISMLocking"/>
    </Versioning>

    <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
        <param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
        <param name="driver" value="javax.naming.InitialContext"/>
        <param name="databaseType" value="mysql"/>
        <param name="minRecordLength" value="1024"/>
        <param name="maxConnections" value="5"/>
        <param name="copyWhenReading" value="true"/>
    </DataStore>

    <Cluster id="node1" syncDelay="2000">
        <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
            <param name="revision" value="${rep.home}/revision.log" />
            <param name="driver" value="javax.naming.InitialContext"/>
            <param name="url" value="java:comp/env/jdbc/sharedRepositoryDS"/>
            <param name="databaseType" value="mysql"/>
            <param name="schemaObjectPrefix" value="journal_"/>
        </Journal>
    </Cluster>
</Repository> 

Questions:

  1. Is the configuration correct?
  2. What is the repository.FileSystem ?
  3. Whats the difference to the repository.Workspace.FileSystem ?
  4. The PersistenceManager is responsible for writing the data, but why should it write into the shared database? (I want to get rid of this bottleneck right?)

Database Tables This is what the local node database looks like (for my taste a little too little): enter image description here

This is what the shared database looks like: enter image description here

1
Talking to some people who are using the enterprise licence it comes out the configuration I posted here is only a little different from the one hippo uses in enterprise projects. So I guess using my configuration puts you on a valid path.cloudnaut

1 Answers

-4
votes

Clustering support for Hippo is for the enterprise edition and it's best to reach us at [email protected] for detailed answers for your specific case.