3
votes

I am using live-dbpedia to retrieve the list of persons. I am executing a sparql query on live-dbpedia endpoints to get the result.I have fixed the offset and limit value in the query and getting the records after each 10000 attempt. But when I was trying to execute at 580000 offset value, 504 Gateway Time-out error happens.

Not Working SPARQL Query:

SELECT DISTINCT ?dbpedia_link str(?name) as ?label str(?label1) as ?label1 ?freebase_link WHERE {
        ?dbpedia_link rdfs:label ?label1 . 
        ?dbpedia_link foaf:name ?name .
        {
         { ?dbpedia_link rdf:type dbpedia-owl:Person }                            
        }                        
        OPTIONAL {?dbpedia_link owl:sameAs ?freebase_link .
        FILTER regex(?freebase_link, "^http://rdf.freebase.com") .}
        FILTER (lang(?label1) = 'en'). 
        ?dbpedia_link dcterms:subject ?sub 
        }Limit 1000
        OFFSET 580000

Working SPARQL Query :

SELECT DISTINCT ?dbpedia_link str(?name) as ?label str(?label1) as ?label1 ?freebase_link WHERE {
            ?dbpedia_link rdfs:label ?label1 . 
            ?dbpedia_link foaf:name ?name .
            {
             { ?dbpedia_link rdf:type dbpedia-owl:Person }                            
            }                        
            OPTIONAL {?dbpedia_link owl:sameAs ?freebase_link .
            FILTER regex(?freebase_link, "^http://rdf.freebase.com") .}
            FILTER (lang(?label1) = 'en'). 
            ?dbpedia_link dcterms:subject ?sub 
            }Limit 1000
            OFFSET 50000

How to overcome this problem.

1
In addition to jimkont's answer, note that limit n and offset m need an order by in order to be useful. If there's no specified ordering, then the endpoint can return the same n results over and over again. E.g., see my answer to How to resolve the execution limits in Linkedmdb. - Joshua Taylor

1 Answers

4
votes

Put a delay between your requests. There is a rate limit in the live endpoint and this is the error you get when you exceed it. There is also a short timeout to make the service more available.

(Disclaimer: I am responsible for the service)