3
votes

How does one compare two RDF graphs with SPARQL? If I have graphs :a and :b, I want to find all the times :a appears in :b. I can query for all of :a's subjects, predicates, and objects, then programmatically build a pattern query that will match :a's pattern in :b. Is there a way that builds an :a pattern query all in SPARQL, with no programmatic construction?

1
This is the second question where you've mentioned “SPARQL graphs”. There is no such thing; SPARQL is a query language for RDF graphs (i.e., sets of triples). If you have an RDF graph A, then it either is a subgraph of B or it isn't; there's not a sense in which it can appear multiple times. [There is a notion of RDF entailment, wherein blank nodes are variables that can be instantiated, so it could be that one graph entails another by multiple substitutions, but that's almost certainly not what you're asking, I think.] - Joshua Taylor
Thank you, corrected. I intend to find multiple instances of :a's structure in :b. - Bondolin
How do you have these graphs? You mention that you don't want to do this programmatically, but where are these graphs? Are they named graphs in a SPARQL endpoint? Files on a disk somewhere? You might be able to do something if the two graphs are named graphs on an endpoint. - Joshua Taylor
They are named graphs. Sorry for the lack of clarity. I am somewhat new with SPARQL and RDF. - Bondolin
No need to be sorry, it's just easier to provide more detailed answers when given more detailed information. Since the graphs are named graphs in SPARQL, we can query against them with SPARQL queries, which was the key to the answer I provided. - Joshua Taylor

1 Answers

7
votes

I set up a Jena Fuseki endpoint with two named graphs, http://a and http://b, which we'll call A and B. A contains one triple, and B contains two. A, (viewed) as a set of triples, is a subset of B, which the following query confirms:

select * where { 
  graph ?g { ?s ?p ?o }
}

-----------------------------------------------------------
| s            | p            | o            | g          |
===========================================================
| <urn:uuid:b> | <urn:uuid:p> | <urn:uuid:b> | <http://b> |
| <urn:uuid:a> | <urn:uuid:p> | <urn:uuid:b> | <http://b> |
| <urn:uuid:a> | <urn:uuid:p> | <urn:uuid:b> | <http://a> |
-----------------------------------------------------------

Now, we can ask for triples that appear in one and not in the other. To ask for triple in B that are not in A, we can use this query:

select * where { 
  graph <http://a> { ?s ?p ?o }
  FILTER NOT EXISTS { graph <http://b> { ?s ?p ?o } }
}

-------------
| s | p | o |
=============
-------------

We can also ask for triples that appear in B, but not in A. We expect and receive one triple.

select * where { 
  graph <http://b> { ?s ?p ?o }
  FILTER NOT EXISTS { graph <http://a> { ?s ?p ?o } }
}

----------------------------------------------
| s            | p            | o            |
==============================================
| <urn:uuid:b> | <urn:uuid:p> | <urn:uuid:b> |
----------------------------------------------

In general, if X contains no triples that are not also in Y, then X is a subset of Y. Using queries like the above, we can find such triples that are in one and not in another.

If we don't care about the particular triples, we can use an ASK query to check whether any exist, without finding out what they are. For instance,

ask { 
  graph <http://a> { ?s ?p ?o }
  NOT EXISTS { graph <http://b> { ?s ?p ?o } }
}

no

because there are no such triples. However, since we're trying to ask whether A is a subgraph of B, which is indicated by their being no triples, we need to invert the truth value here. So we use:

ask { 
  NOT EXISTS {
    graph <http://a> { ?s ?p ?o }
    NOT EXISTS { graph <http://b> { ?s ?p ?o } }
  }
}

yes

Similarly, if we ask whether B is a subgraph of A, we get no:

ask { 
  NOT EXISTS {
    graph <http://b> { ?s ?p ?o }
    NOT EXISTS { graph <http://a> { ?s ?p ?o } }
  }
}

no