2
votes

I have three node (server) Apache Ignite cluster with one client. I am using disk based persistent storage. I created cache worth 10M records. AT some point the cluster crashed so I wanted to restart. This is what I am running into:

  1. When I restart the server nodes, it throws the following exception. I have copied the exception message below.
  2. The client blocks and it does not do anything and I do not see any exception message but it appears to be blocking with the following message.
  3. I have inlcuded the default-config.xml here.

Any help in resolving this issue will be greatly appreciated. Thank you.

Server side exception

SEVERE: Failed to initialize cache. Will try to rollback cache start routine. [cacheName=geo10]
class org.apache.ignite.IgniteCheckedException: Failed to verify store file (invalid page size) [expectedPageSize=4096, filePageSize=2048]
        at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.checkFile(FilePageStore.java:185)
        at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:392)
        at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:291)
        at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:288)
        at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:273)
        at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:569)
        at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:487)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.getOrAllocateCacheMetas(GridCacheOffheapManager.java:515)
        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.initDataStructures(GridCacheOffheapManager.java:86)
        at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.start(IgniteCacheOffheapManagerImpl.java:139)
        at org.apache.ignite.internal.processors.cache.CacheGroupContext.start(CacheGroupContext.java:868)
        at org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCacheGroup(GridCacheProcessor.java:1935)
        at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1860)
        at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:748)
        at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:773)
        at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:574)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:745)

Sep 10, 2017 2:42:46 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Failed to perform final activation steps [nodeId=2077e165-e8a2-4989-934c-c24c5c0bea80, client=false, topVer=AffinityTopologyVersion [topVer=1, minorTopVer=1]]
java.lang.NullPointerException
        at org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:240)
        at org.apache.ignite.internal.processors.service.GridServiceProcessor.onActivate(GridServiceProcessor.java:370)
        at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$5.run(GridClusterStateProcessor.java:576)
        at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6664)
        at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:817)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

class org.apache.ignite.IgniteException: null
        at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:957)
        at org.apache.ignite.internal.IgniteKernal.active(IgniteKernal.java:3427)
        at com.accure.ignite.IgniteStarter.main(IgniteStarter.java:24)
Caused by: class org.apache.ignite.IgniteCheckedException: null
        at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$GridChangeGlobalStateFuture.onAllReceived(GridClusterStateProcessor.java:816)
        at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$GridChangeGlobalStateFuture.onResponse(GridClusterStateProcessor.java:809)
        at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.processChangeGlobalStateResponse(GridClusterStateProcessor.java:673)
        at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.sendChangeGlobalStateResponse(GridClusterStateProcessor.java:639)
        at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.access$2200(GridClusterStateProcessor.java:72)
        at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$5.run(GridClusterStateProcessor.java:597)
        at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6664)
        at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:817)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to perform final activation steps
                at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$5.run(GridClusterStateProcessor.java:589)
                ... 6 more
        Caused by: java.lang.NullPointerException
                at org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:240)
                at org.apache.ignite.internal.processors.service.GridServiceProcessor.onActivate(GridServiceProcessor.java:370)
                at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor$5.run(GridClusterStateProcessor.java:576)
                ... 6 more
[14:43:18] Topology snapshot [ver=2, servers=1, clients=1, CPUs=8, heap=18.0GB]
Sep 10, 2017 2:43:18 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Error when executing service: null
java.lang.NullPointerException
        at org.apache.ignite.internal.processors.service.GridServiceProcessor.serviceEntries(GridServiceProcessor.java:1289)
        at org.apache.ignite.internal.processors.service.GridServiceProcessor.access$2000(GridServiceProcessor.java:119)
        at org.apache.ignite.internal.processors.service.GridServiceProcessor$TopologyListener$1.run0(GridServiceProcessor.java:1578)
        at org.apache.ignite.internal.processors.service.GridServiceProcessor$DepRunnable.run(GridServiceProcessor.java:1806)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Client Side Exception

[14:43:15] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[14:43:16] Security status [authentication=off, tls/ssl=off]
[14:43:16] REST protocols do not start on client node. To start the protocols on client node set '-DIGNITE_REST_START_ON_CLIENT=true' system property.

default-config.xml

<beans xmlns="http://www.springframework.org/schema/beans"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <!-- Enabling Apache Ignite Persistent Store. -->
<property name="persistentStoreConfiguration">
        <bean class="org.apache.ignite.configuration.PersistentStoreConfiguration"/>
    </property>

    <property name="binaryConfiguration">
        <bean class="org.apache.ignite.configuration.BinaryConfiguration">
            <property name="compactFooter" value="false"/>
        </bean>
    </property>

    <property name="memoryConfiguration">
        <bean class="org.apache.ignite.configuration.MemoryConfiguration">
            <!-- Setting the page size to 4 KB -->
            <property name="pageSize" value="#{4 * 1024}"/>
        </bean>
    </property>
    <!-- Explicitly configure TCP discovery SPI to provide a list of initial nodes. -->
    <property name="discoverySpi">
        <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
            <property name="ipFinder">
                <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                    <property name="addresses">
                        <list>
                            <!-- In distributed environment, replace with actual host IP address. -->
                            <value>127.0.0.1:55500..55502</value>
                        </list>
                    </property>
                </bean>
            </property>
        </bean>
    </property>
</bean>

After I made changes in the default-config to use the pageSize=2Kb, the server still does not start and show the following exception message. Here is the stacktrace.

SEVERE: Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=0], nodeId=4a2cb984, evt=NODE_JOINED]
class org.apache.ignite.IgniteCheckedException: WAL history is too short [descs=[org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1d9, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1da, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1db, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1dc, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1dd, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1de, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1df, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1e0, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1e1, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1e2, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1e3, org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@1e4], start=FileWALPointer [idx=0, fileOffset=0, len=0, forceFlush=false]]
1
is it 2.0 or 2.1 ignite version? - Michael
Michael, this is Ignite 2.1 - Sam
did you changed page size: <property name="pageSize" value="#{4 * 1024}"/> after restart? exceptions says: org.apache.ignite.IgniteCheckedException: Failed to verify store file (invalid page size) [expectedPageSize=4096, filePageSize=2048] looks like first you run server with default value, then changed it to 4096. - Michael
Can you attach a full listing of ${IGNITE_HOME}/work/db directory? The "WAL history is too short" exception may be thrown if you accidentally cleared the cp subdirectory or switched WAL mode. Also, it may be easier to get an answer if you post this question on Ignite user mailing list. - Alexey

1 Answers

2
votes

Looks like first, you started node with default pageSize and later you changed it to:

so now Ignite can not read storage files because it expected to find it with pageSize 4kb while actual store files have page size 2kb.

Try to set it back to 2kb.