2
votes

Many datasets have a history of changes. Making historical data available as Linked Data can be a challenge. The general case I am considering is one where a dataset has data about things that have properties that can change in time. An example could be the history of Windsor Castle: it has had many configurations over the past, but it can still be considered the same thing. One way to handle that could be to have temporal annotation for properties. But then one gets into the awkward territory of having metadata per RDF triple. I think a simpler solution would be to think in terms of versions of things: when one or more properties of a resource change, a new version comes into existence.

Below is a simple example of someone who changes his name at a certain date:

@prefix : <http://www.example.com/mydataset/> .
@base <http://www.example.com/mydataset/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

:p1 a foaf:Person ;
  foaf:name "Bob" ;
  dcterms:valid "start=2015-06-20;" ;
  dcterms:replaces <p1/version1> .

<p1/version1> a foaf:Person ;
  foaf:name "Alfred" ;
  dcterms:valid "start=1975-08-01; end=2015-06-19;" ;
  dcterms:isVersionOf :p1 ;
  dcterms:isReplacedBy :p1 .

In this example, the main URI (:p1) always points at the most recent version. That is useful, because historical data may not always be needed. The current data do have a link to the previous version. The attributes dcterms:replaces and dcterms:isReplacedBy can form a chain of older versions.

I like this setup because it is straightforward and does not rely on something like SPARQL to work. However, a problem is the specification of temporal validity. The only appropriate term I could find is dcterms:valid. But its range is a literal. That works with the DCMI Period Encoding Scheme, but I think it would be much more useful to be able to use common data types for time like xsd:dateTime or xsd:gYear. That would help querying (by time range or by point in time) and ordering the data a lot. For example, temporal querying in SPARQL is dependent on datatype xsd:dateTime.

So my question is: Can someone suggest a simple versioning scheme for Linked Data that can use common data types for time? Or maybe just an alternative for dcterms:valid?

UPDATE: A suggestion was to look at PROV, which provides semantics for provenance, for alternatives. PROV does include the concept of validity, and an attempt has been made to map dct:valid to PROV. My reputation is too low to post additional hyperlinks, so I quote:

dct:valid: "Date (often a range) of validity of a resource." This property could correspond to PROV's generation and invalidation of the resource or one of its specializations. However, dct:valid can be used to set expiry dates (e.g., resource valid until 2015), which is not provenance (it is not about past events). Thus this property is left out of the mapping.

For historical data, which this question is about, the fact that dct:valid can set future dates does not matter. So PROV's generation and invalidation could still be applicable. The relevant PROV terms seem to be prov:generatedAtTime and prov:invalidatedAtTime. They could be used to express the temporal validity of a version. However, the range of those properties is xsd:dateTime, which means each time needs to be known up to the level of seconds. Especially for historical data from before the digital age, that is not always known. Sometimes all is known is a year or a date. So it seems PROV is too restrictive in another way.

1
This is not a good Q for this site, because there is no simple answer. RDF has no inherent concept of versioning. Anything described in RDF "is." All solutions for temporality are kludges to some degree. You might look at PROV which covers much in this area.TallTed
@TallTed is correct; PROV does enable attribution and intervals.Jay Gray
@TallTed: do you think the question does not belong here or is it just the title that is misleading? I did not mean versioning in RDF, but versioning in RDF based data, or Linked Data that is based on RDF. About PROV: it is a good lead, but it seems it does not provide a full solution. I will update the original question to include PROV.F.J. Knibbe
I think you have an interesting problem and PROV is a very interesting approach, but I don't think it solves your problem. As for whether this is a good question for SO, it's a hard to say, but I'd vote to keep it here.Sentry
Hm. As a participant in the PROV Working Group, I know the intent was to permit vague notations like the year a work of art was created, where we might know no other timing details. A quick look suggests that might not have been communicated properly, or even be true, in the spec as completed. That said, RDF ontologies are not enforced like SQL schema definitions. the ontology saying the range is xsd:dateTime does not prevent you from treating this range as including xsd:date and/or xsd:gYear...TallTed

1 Answers

1
votes

A vocabulary to support these kind of changes is for example ChangeSet

http://vocab.org/changeset/

If you model it with this you have on the one hand your data and on the other hand metadata about the changes.