1
votes

I am trying to figure out how Jena TDB handles SPARQL queries with multiple FROM clauses on the physical query plan level. I would like to know how Jena TDB handles executing a query over different graphs.

I have made some small experiments and looked at the query algebra, however, it is not clear to me how the FROM clauses affect the algebra. It looks like that the FROM clauses are discarded in the algebra. I expect that the algebra is evaluated over the union of the graphs, but I would like to be sure.

I have the following quads:

<http://example.com/book2/> <http://example.com/price> "5"^^<http://www.w3.org/2001/XMLSchema#integer> <http://example.com/A> .
<http://example.com/book2/> <http://example.com/title> "Lord of the Rings" <http://example.com/B> .

and the following query:

SELECT (AVG(?price) as ?total)
FROM <http://example.com/A>
FROM <http://example.com/B>
WHERE {
    ?book <http://example.com/price> ?price .
    ?book <http://example.com/title> ?title .
}

./tdbquery --loc test --query test.sparql --explain

The query algebra looks as follows:

INFO  exec                 :: ALGEBRA
  (project (?total)
    (extend ((?total ?.0))
      (group () ((?.0 (avg ?price)))
        (bgp (triple ?book <http://example.com/price> ?price)))))

When I execute the query over the data I receive the expected result.

1

1 Answers

0
votes

FROM (and FROM NAMED) aren't really part of the query, but indications of what the dataset to be queried ought to be. These clauses don't alter what the query will do, only what it operates on, so you don't see them in the algebra.

What a particular processor does with that information varies:

  • some processors will build the requested dataset (even downloading data)
  • but it is also common to provide a dataset explicitly in APIs (e.g. query(query_string, dataset)) in which case the processor will ignore it since a dataset has been provided.
  • a dataset might also be supplied with in a SPARQL protocol request, in which case, as with the API call, the processor will ignore the NAMED clause.

Now a TDB database is a dataset, but TDB has a special feature called 'dynamic datasets' which used FROM and FROM NAMED to form a sub-dataset in effect, limiting the graphs queried to those mentioned in the FROM clauses.