The following query loads contracts from my data set (a contract is between an organization and a partner).
SELECT ?contract ?organisation ?partner
WHERE {
?organisation
a gr:BusinessEntity ;
rejstriky:contract ?contract .
?contract a rejstriky:Contract ;
rejstriky:partner ?partner .
}
GROUP BY ?contract ?organisation ?partner
This query returns around 8000 contracts and it does that immediately (it takes just a fraction of second). Now I need to load labels/names for both the organization and the partner. There might be multiple names available, I just need one. This is my query:
SELECT ?contract ?organisation ?partner
(SAMPLE(?organisationNames) AS ?organisationName)
(SAMPLE(?partnerNames) AS ?partnerName)
WHERE {
?organisation
a gr:BusinessEntity ;
rejstriky:contract ?contract .
?contract a rejstriky:Contract ;
rejstriky:partner ?partner .
?organisation gr:legalName ?organisationNames .
?partner gr:legalName ?partnerNames .
}
GROUP BY ?contract ?organisation ?partner
This query suddenly takes several minutes to finish.
I did some experiments and I found out that if I decided to get all the names using separate SPARQL calls (by 40 names in a single batch), it'd take less than 2 minutes (it would be significantly faster). Regardless of that, if I'm able to generate those 8000 items within a fraction of second, loading two labels for each item should not take that long.
Do you have any ideas how to optimize my query? Note that I'm using Virtuoso.
SAMPLEaggregate? That is, changing theSELECTlist to?contract ?organisation ?partner ?organisationNames ?partnerNames? You might also raise this to the Virtuoso Users mailing list or the OpenLink Support Forums which audiences include several members of the Virtuoso Development team... - TallTed