0
votes

i am having a headache working with JackRabbit JCR in java. It's about making a xpath expression for searching for entities in my repository.

Let's do a brief synopsis of what kind of data is stored, we have 3 a node class called "Entry", that extends another node class named "BaseEntry" and that extends another called "BaseNode". The Entry class represents a Node in our JCR system, and has a set of properties (mapped as attributes in the corresponding class), an also inherits the properties mapped in their superclasses as well. The "BaseEntry" class aggregate many (zero or many) attachments. That represents, raw files (.doc,.xls,.pdf,.txt,.etc :D) associated with an entry.

These are part of the classes definition and the properties of interest...

"Entry" Class

@Node(jcrType = "entry",  extend = BaseEntry.class)
public class Entry extends BaseEntry {

  ... // nothing really important here
}

"BaseEntry" Class

@Node(jcrType = "baseEntry", extend = BaseNode.class, isAbstract = true)
public abstract class BaseEntry extends BaseNode {

  @Collection (jcrType = "attachment",
      collectionConverter = NTCollectionConverterImpl.class)
  protected List<Attachment> attachments = new ArrayList<Attachment>();

  ...

}

"BaseNode" Class

@Node(jcrType = "baseNode", isAbstract = true)
public abstract class BaseNode {

  @Field(jcrName = "name", id = true)
  protected String name;

  @Field(jcrName = "creationDate")
  protected Date creationDate;

  ...
}

"Attachement" Class

@Node(jcrType = "attachment", discriminator = true)
public class Attachment extends BaseNode implements Comparable<Attachment> {

  /** The attachment's content resource. It cannot be null. */
  @Bean(jcrType = "skl:resource", autoUpdate = false)
  private Resource content;
  ...
}

"Resource" Class @Node(jcrType = "skl:resource", discriminator = true)

public class Resource extends BaseNode {

  /** Resource's MIME type. It cannot be null or empty. */
  @Field(jcrName="jcr:mimeType", jcrDefaultValue = "")
  private String mimeType;

  /** Resource's size (bytes). */
  @Field(jcrName="skl:size")
  private long size;

  /** Resource's content data as stream. It cannot be null. */
  @Field(jcrName="jcr:data")
  private InputStream data;

  ...
}

The custom_nodes.xml has the following definitions for those nodes:

<!-- Base node type definition -->
  <nodeType name="docs:baseNode"
            isMixin="false"
            hasOrderableChildNodes="false" >
    <supertypes>
      <supertype>nt:hierarchyNode</supertype>
    </supertypes>
    <propertyDefinition name="docs:name"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="docs:searchPath"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="docs:creationDate"
                        requiredType="Date"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="docs:lastModified"
                        requiredType="Date"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <childNodeDefinition name="*"
                         defaultPrimaryType="docs:baseNode"
                         autoCreated="false"
                         mandatory="false"
                         onParentVersion="COPY"
                         protected="false"
                         sameNameSiblings="false">
      <requiredPrimaryTypes>
        <requiredPrimaryType>docs:baseNode</requiredPrimaryType>
      </requiredPrimaryTypes>
    </childNodeDefinition>
  </nodeType>



  <!-- Resource node type definition -->
  <nodeType name="skl:resource"
            isMixin="false"
            hasOrderableChildNodes="false" >
    <supertypes>
      <supertype>docs:baseNode</supertype>
      <supertype>nt:resource</supertype>
    </supertypes>
    <propertyDefinition name="skl:size"
                        requiredType="Long"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:externalUri"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
  </nodeType>

  <!-- Attachment node type definition -->
  <nodeType name="skl:attachment"
            isMixin="false"
            hasOrderableChildNodes="false" >
    <supertypes>
      <supertype>docs:baseNode</supertype>
    </supertypes>
    <propertyDefinition name="skl:requestId"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:contentExternalUri"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:multiPagePreviewUrl"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
  </nodeType>

  <!-- Base Entry node type definition -->
  <nodeType name="skl:baseEntry"
            isMixin="false"
            hasOrderableChildNodes="false" >
    <supertypes>
      <supertype>docs:baseNode</supertype>
    </supertypes>
    <propertyDefinition name="skl:title"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:description"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:author"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:creator"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:creatorUnique"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:creatorMail"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:office"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:tags"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="true" />
  </nodeType>

  <!-- SKL Entry node type definition -->
  <nodeType name="skl:entry"
            isMixin="false"
            hasOrderableChildNodes="false" >
    <supertypes>
      <supertype>docs:baseNode</supertype>
      <supertype>skl:baseEntry</supertype>
    </supertypes>
    <propertyDefinition name="skl:languageName"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:rating"
                        requiredType="Long"
                        autoCreated="false"
                        mandatory="true"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    <propertyDefinition name="skl:urls"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="true" />
    <propertyDefinition name="skl:visitors"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="true" />
    <propertyDefinition name="skl:datePublished"
                        requiredType="String"
                        autoCreated="false"
                        mandatory="false"
                        onParentVersion="COPY"
                        protected="false"
                        multiple="false" />
    </nodeType>

So i am looking forward make and xpath statement for search for those entries that contains inside an attachment some kind of text. So basically, the idea is search for entries that have files that contain a specific text or keywords.

So far, i try with something like this...

String xPathQuery = "<BASE PATH>//element(*, skl:entry) [jcr:contains(*//content,'*<keyword>*')]";
String xPathQuery = "<BASE PATH>//element(*, skl:entry) [jcr:contains(*//@jcr:data,'*<keyword>*')]";

but these things, doesn't works well as you can guess...

I hope, one charity soul may help me in this quest.. that is no going so good :S. Thanks for advance to everyone how see this!.

Greetings!!

Víctor

1
Hi when displaying the nodetype information it's most of the time more better to use the compact nodetype definition (.cnd) because it's easier to read.Jeroen

1 Answers

3
votes

The Jackrabbit FAQ explains your problem:

Why doesn't //*[jcr:contains(@jcr:data, 'foo')] return matches for binary content?

Extracted text from binary content is only indexed on the parent node of the @jcr:data property. Use jcr:contains() on the nt:resource node.

So in your case I would use something like:

String xPathQuery = "<BASE PATH>//element(*, skl:resource)[jcr:contains(.,'*<keyword>*')]";

or

String xPathQuery = "<BASE PATH>//element(*, skl:entry)//element(*, skl:resource)[jcr:contains(.,'*<keyword>*')]";

I would also strongly discourage the usage of a * wildcard at the beginning of your contains statement. Since Lucene is an inverted index this can be extremely inefficient.