2
votes

I am working on a MarkLogic tiered storage POC with HDFS as a storage layer for one of the tier. I haven been trying to create a forest with data directory as Hadoop file system directory.

I have one Hadoop cluster and one MarkLogic cluster. I downloaded the configuration files from Hadoop and copied them to /usr/Hadoop directory and I have also downloaded the required jar files based on the documentation here. https://docs.marklogic.com/guide/performance/disk-storage#id_27091

I have placed these as well in the /usr/Hadoop directory with proper lib structure. I am using MarkLogic 7.0-4.3 version and Cloudera Hadoop Distribution 5.3.1 for HDFS.

I am getting the below error when I try creating the forest.

2015-03-12 19:17:20.087 Error: Automount Foresthadoop: SVC-HDFSNOT:
HDFS not available for 'hdfs://{namdenode-hostname}:8020/tmp': unknown error

I tried changing the log level to finest in the group configurations and I have also added trace events for the forest. But I am not able to get any additional details that could point me to what the error is about.

Any help in this regard would be appreciated. Please let me know if there are any other ways to connect to HDFS as a forest directory.

1
What's the full version number of MarkLogic? Are those curly braces in hdfs://{namdenode-hostname}:8020/tmp literal, or did you sanitize the log message? - mblakele
I am using MarkLogic 7.0-4.3 and Clodera Hadoop Distribution 5.3.1. I changed the log message to replace the actual hostname with curly braces and string literal. - Sudheer Y

1 Answers

1
votes

The unknown error was due to Java I had. I had the JAVA_HOME pointing to IBM version of java and while trying to connect with HDFS, MarkLogic keeps giving errors in the logs about missing .io files in the ibm java installation directory. We found those missing io files and placed them in the appropriate directory which finally resulted to an unknown error.

After we installed the oracle Java 7 and made the JAVA_HOME to point to that location, MarkLogic worked with CDH 4.3.1 version which is the certified version of Hadoop by MarkLogic

When I tried with CDH5.3.1 version of hadoop with latest jars, MarkLogic keeps giving below error though I had the jar containing this file in the HDFS client.

2015-03-19 15:53:44.516 Alert: XDMP-FORESTERR: Error in initialization of forest Foresthadoop2: SVC-NOJCLASS: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.util.VersionInfo

When I approached MarkLogic support team, they confirmed that CDH5.3.1 version of Hadoop is not yet certified by MarkLogic and they have it as part of their product map.

The conclusion for now is CDH 5.3.1 will not work with MarkLogic.