1
votes

I have queried DBPedia via Virtuoso SPARQL endpoint and Jena, but the results are different. My query is :

SELECT (COUNT(DISTINCT (?v)) AS ?num)
FROM <http://dbpedia.org>
WHERE {
  ?x  <http://dbpedia.org/property/deathPlace>  ?v .
  ?v  rdf:type                                  ?t .
  FILTER STRSTARTS( STR(?t), STR("http://dbpedia.org/ontology/Place") )
}

I execute my query in Jena by this function :

    public static ArrayList<String> query(String queryStr) {
    ArrayList<String> result = new ArrayList<>();
    queryStr = SPARQL_PREFIX + queryStr;
    Query query = QueryFactory.create(queryStr);

    // Remote execution.
    try (QueryExecution qexec = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query)) {
        // Set the DBpedia specific timeout.
        ((QueryEngineHTTP) qexec).addParam("timeout", "10000");

        // Execute.
        ResultSet rs = qexec.execSelect();
        while (rs.hasNext()) {
            result.add(rs.next().toString());
        }
    } catch (Exception e) {
        e.printStackTrace();
        System.err.println("============================================");
        System.err.println(queryStr);
        System.err.println("============================================");
    }
    return result;
}

I've set the graph to search in the FROM expression but the result are still different. When I execute the query on Virtuoso's SPARQL endpoint, the result is 21482, but the result returned by Jena is 9586.

Is there any idea?

1
I don't see any different results in the question. What results are you actually seeing? - Joshua Taylor
How are you executing the Jena query? Using "FROM" may not do what you expect it do here. - Joshua Taylor
I added the details. - user3070752
@amirveyseh It might cause the problem if it's more expensive. It could be that DBpedia imposes different limits on remote queries vs. queries launched from its website. It could be that remote queries get less time, so you get fewer results in the alloted time, and so the result count is smaller. - Joshua Taylor
I meant to use the class URI dbpedia.org/ontology/Place directly as resource instead of variable ?t and a string comparison on it, which is more expensive. - UninformedUser

1 Answers

1
votes

As AKSW and Taylor mentioned in the comments, DBPedia has different limits on remote queries than on queries launched from its website. In this case, string matching (which is an expensive operation) makes the query more time consuming, and the result returned by jena is only part of the actual result of the query.

To solve this we can directly use URI instead of its string :

SELECT  (COUNT(DISTINCT (?v)) AS ?num)
  FROM  <http://dbpedia.org>
 WHERE 
   {
     ?x  <http://dbpedia.org/property/deathPlace>  ?v   .
     ?v  rdf:type  <http://dbpedia.org/ontology/Place>  .
   }