2
votes

I inserted 200000 xml document (approximately Total size 1GB) in my database through MLCP command. Now I want to run below search query against that database (database with default index setup in the admin api) to get all documents.

let $options :=
 <options xmlns="http://marklogic.com/appservices/search">
  <search-option>unfiltered</search-option>
  <term>
    <term-option>case-insensitive</term-option>
  </term>
  <constraint name="Title">
      <range collation="http://marklogic.com/collation/" facet="true">
       <element ns="http://learning.com" name="title" />
      </range>
  </constraint>
  <constraint name="Keywords">
      <range collation="http://marklogic.com/collation/" facet="true">
       <element ns="http://learning.com" name="subjectKeyword" />
      </range>
  </constraint>
  <constraint name="Subjects">
      <range collation="http://marklogic.com/collation/" facet="true">
       <element ns="http://learning.com" name="subjectHeading" />
      </range>
  </constraint>
  <return-results>true</return-results>
  <return-query>true</return-query>
</options>
let $result := search:search("**", $options, 1, 20)
return $result

Range Index:-

      <range-element-index>
      <scalar-type>string</scalar-type>
      <namespace-uri>http://learning.com</namespace-uri>
      <localname>title</localname>
      <collation>http://marklogic.com/collation/</collation>
      <range-value-positions>false</range-value-positions>
      <invalid-values>ignore</invalid-values>
    </range-element-index>
      <range-element-index>
      <scalar-type>string</scalar-type>
      <namespace-uri>http://learning.com</namespace-uri>
      <localname>subjectKeyword</localname>
      <collation>http://marklogic.com/collation/</collation>
      <range-value-positions>false</range-value-positions>
      <invalid-values>ignore</invalid-values>
    </range-element-index>
          <range-element-index>
      <scalar-type>string</scalar-type>
      <namespace-uri>http://learning.com</namespace-uri>
      <localname>subjectHeading</localname>
      <collation>http://marklogic.com/collation/</collation>
      <range-value-positions>false</range-value-positions>
      <invalid-values>ignore</invalid-values>
    </range-element-index>

In each xml document subjectkeyword and title value like be

<lmm:subjectKeyword>anatomy, biology, illustration, cross, section, digestive, human, circulatory, body, small, neck, head, ear, torso, veins, teaching, model, deep, descending, heart, brain, muscles, lungs, diaphragm, c</lmm:subjectKeyword><lmm:title>CORTY_EQ07-014.eps</lmm:title>

But it taking lots of time even query console saying Too many elements to render or Parser Error: Cannot parse result. File Size too large

3

3 Answers

4
votes

I'd also add that if you wanted to fetch all documents (which I wouldn't recommend on a non-trivial database) doing it directly rather than as a wildcarded search is going to be more efficient: fn:doc() (or, as Geert suggests, paginated: fn:doc[1 to 20]

3
votes

First of all, don't try to get all documents at once. It will mean MarkLogic will have to go to disk for every document, process, and serialize it, and last but not least, client-side need to receive and display too. The latter is probably the bottle-neck here. This is typically why user application show search results by 10 or 20 at a time. In other words: use pagination.

I also recommend running unfiltered for better performance.

HTH!

2
votes

Pagination is definitely key here, and I'm curious about your facets. From your example, I'm imagining "Title" is almost always unique across your 200k documents. And the lmm:subjectKeyword element seems like it needs a little post-processing to make it more useful as a facet - it's a string of comma-delimited values, which means subjectKeyword will almost always be unique too (I recommend putting each of these values into a separate element, that would be much more useful as a facet). And I'm guessing subjectHeading is mostly unique too.

Facets are generally useful when you have a bounded set of values - e.g. for laptops, bounded sets include manufacturer, monitor size, and buckets for price range. Once you get into hundreds of values, the utility of a facet decreases for a user - how many users really want to sort through hundreds or thousands of values to find what they want? And in your case, we're probably talking about tens of thousands of unique values, if not 200k unique values (particularly for "Title"). And - when you have that many unique values, facet resolution time is going to take longer.

So before exploring the facet resolution time - what problem are you trying to solve with these 3 facets?

Without knowing anything more, I'd post-process that subjectKeyword element into many elements, each with a single keyword in it, and then put a facet on that element. Ideally, you have dozens of keywords, maybe hundreds, and resolving that facet should be very fast.