Many datasets have a history of changes. Making historical data available as Linked Data can be a challenge. The general case I am considering is one where a dataset has data about things that have properties that can change in time. An example could be the history of Windsor Castle: it has had many configurations over the past, but it can still be considered the same thing. One way to handle that could be to have temporal annotation for properties. But then one gets into the awkward territory of having metadata per RDF triple. I think a simpler solution would be to think in terms of versions of things: when one or more properties of a resource change, a new version comes into existence.
Below is a simple example of someone who changes his name at a certain date:
@prefix : <http://www.example.com/mydataset/> .
@base <http://www.example.com/mydataset/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
:p1 a foaf:Person ;
foaf:name "Bob" ;
dcterms:valid "start=2015-06-20;" ;
dcterms:replaces <p1/version1> .
<p1/version1> a foaf:Person ;
foaf:name "Alfred" ;
dcterms:valid "start=1975-08-01; end=2015-06-19;" ;
dcterms:isVersionOf :p1 ;
dcterms:isReplacedBy :p1 .
In this example, the main URI (:p1
) always points at the most recent version. That is useful, because historical data may not always be needed. The current data do have a link to the previous version. The attributes dcterms:replaces
and dcterms:isReplacedBy
can form a chain of older versions.
I like this setup because it is straightforward and does not rely on something like SPARQL to work. However, a problem is the specification of temporal validity. The only appropriate term I could find is dcterms:valid
. But its range is a literal. That works with the DCMI Period Encoding Scheme, but I think it would be much more useful to be able to use common data types for time like xsd:dateTime
or xsd:gYear
. That would help querying (by time range or by point in time) and ordering the data a lot. For example, temporal querying in SPARQL is dependent on datatype xsd:dateTime
.
So my question is: Can someone suggest a simple versioning scheme for Linked Data that can use common data types for time? Or maybe just an alternative for dcterms:valid
?
UPDATE: A suggestion was to look at PROV, which provides semantics for provenance, for alternatives. PROV does include the concept of validity, and an attempt has been made to map dct:valid
to PROV. My reputation is too low to post additional hyperlinks, so I quote:
dct:valid
: "Date (often a range) of validity of a resource." This property could correspond to PROV's generation and invalidation of the resource or one of its specializations. However,dct:valid
can be used to set expiry dates (e.g., resource valid until 2015), which is not provenance (it is not about past events). Thus this property is left out of the mapping.
For historical data, which this question is about, the fact that dct:valid
can set future dates does not matter. So PROV's generation and invalidation could still be applicable. The relevant PROV terms seem to be prov:generatedAtTime
and
prov:invalidatedAtTime
. They could be used to express the temporal validity of a version. However, the range of those properties is xsd:dateTime
, which means each time needs to be known up to the level of seconds. Especially for historical data from before the digital age, that is not always known. Sometimes all is known is a year or a date. So it seems PROV is too restrictive in another way.
xsd:dateTime
does not prevent you from treating this range as includingxsd:date
and/orxsd:gYear
... – TallTed