2
votes

We are performing a series of different SPARQL queries on a database containing ~5 million triples.

Our queries often cause a XDMP-MEMCANCELED-error though not consistently, they mostly return a correct result within a few seconds or less. Some queries occationally seems to hang and cause the server to run at 100% CPU until the query times out.

We have tried increasing what memory-related settings we could find. This query runs fine on other triple stores/engines.

We are running MarkLogic 8.0-11 AWS instance with 8 gb of internal memory and 16 gb of swap space.

Example of a fairly straightforward query that sometimes causes the error:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX t5_m: <http://url.com/T5/model#>
PREFIX t5_d: <http://url.com/T5/data#>
SELECT DISTINCT
?_app_id
( ?_err as ?_reason )
?_comment
?_severity
WHERE
{
BIND ( 3 as ?_severity)
BIND ( "Generated by HL7 v2 Conformance Profile of IHE PCD-01 message" as ?_comment )

FILTER( ?_app_id = 'APP_ID')
FILTER ( ?_ts  >= '2015-04-21T09:04:07.871' )
FILTER ( ?_ts  <= '2015-04-21T09:07:43.973' )

?ACK t5_m:hasMSH ?MSH .
?MSH t5_m:hasMSH.5 ?MSH_5 .
?MSH_5 t5_m:hasHD.1 ?HD_1 .
?HD_1 t5_m:hD.1Value ?_app_id .

?ACK t5_m:hasMSA ?MSA .
?MSA t5_m:hasMSA.2 ?MSA_2 .
?MSA_2 t5_m:mSA.2Value ?_msg_id .

?PCD_01_Message a t5_m:PCD_01_Message .
?PCD_01_Message t5_m:id ?_msg_id .
?PCD_01_Message t5_m:timeStamp ?_ts .

?ACK t5_m:hasERR ?ERR .
?ERR t5_m:hasERR.7 ?ERR_7 .
?ERR_7 t5_m:eRR.7Value ?_err .
}

Is there some relevant configuration settings that we have missed or is there something wrong with this query?

Yours truly, Alexander

1

1 Answers

4
votes

When the total hash join table size of all running SPARQL queries exceeds 50% of the host memory, the SPARQL query using the most memory for hash joins will be canceled with the "XDMP-MEMCANCELED" error. This may indicate a number of things:

  1. The host is overloaded with the number of simultaneously executing SPARQL queries. You could try adding more memory to the host (8Gb is very small) or load-balancing SPARQL queries across more MarkLogic hosts.

  2. The SPARQL query optimizer is picking poor query plans for your query. For SPARQL queries with larger numbers of joins (for instance your query has 13 joins), you could try to execute the sem:sparql() function with a higher optimization level, ie: try adding the option "optimize=2".

  3. The query optimizer may have a bug or deficiency that causes the choice of a poor query plan. You should contact MarkLogic support to look into this - they will be able to guide you to find out the query plan chosen and the optimization parameters that lead to your problem. They may be able to suggest a solution given more information, or they may file a bug report for a fix to be made.

John