2
votes

I am trying to make a semantic web application about running races in my area (10k, half marathons, marathons). More specifically I want to collect and publish (in RDF) data about races, participants and results, and merge similar data. The last few weeks I have been testing Jena (including TDB), doing some web scraping from static web sites about running races and reading about vocabularies and ontologies.

I think the most reputable ontology to use for my application, and also the one with the lowest barrier to entry (not very over-engineered), is the BBC sports ontology. http://www.bbc.co.uk/ontologies/sport/2011-02-17.shtml

I have a few questions about using BBC sports and making the application:

Is it okay for me to use BBC sports ontology even if I just use a small subset of it?

I wanted to look at the schema for the ontology to understand it better, but I can't seem to find it anywhere. Is BBC keeping it secret or have I just been looking for it in the wrong places?

Is there any way for me to know for sure if I am using the ontology correctly? My native language is not English so I am afraid I might misunderstand some of the concepts in the ontology.

When I add new triples to my (TDB) triple store, what is the convention for creating a new URI for a resource? More specifically, should the URI end with a name or a uid? Will this affect merging of similar data from different data sets?

Can you recommend any semantic web tools for making a resource URI dereferenceable? I'm not putting the application on the web anytime soon, but it would still be nice if I could access the URI's locally, for instance

http://localhost/running/12345.
1

1 Answers

5
votes

You've got a few different questions here, and some are easier to answer than others.

Finding the BBC ontologies

Many of the BBC websites use content negotiation for the documents, and you can get the RDF documents that you're looking for by setting the appropriate HTTP headers, or even more simply by requesting the resource with an appropriate extension. For instance, the human readable version of the ontology is

To get the machine readable version, use

The fact that they do this isn't immediately obvious. At least one place it's stated is their Feeds and Data section of Nature where they state:

How do I get the RDF?

You've a couple of options. We content negotiate on our standard URLs - if you're client's request header specifies RDF then that's what you'll get. Alternatively is you add .rdf to the end of our URLs then we will return RDF rather than HTML.

You can use as few or as many of the classes, properties, and individuals defined in the BBC's ontologies as you want. That's part of the beauty of the Semantic Web. As to whether you're using them correctly, most of them have somewhat descriptive labels, but the labels are in English. I don't think that there's an automated way to check your usage; you'll probably just have to check with someone more comfortable with English, I'm afraid. Of course, you could also add labels and comments in your own language, and make that available for others. That's another nice thing about the Semantic Web and Linked Data, you can say anything about anything whenever you want.

Conventions for creating identifiers

There's no single standard out there for creating IRIs. It's nice if they're human readable, but if you're generating lots of things programmatically, that can be hard to accomplish. If you can't make them human readable, at least be sure to give them appropriate rdfs:labels when you can. A question about IRI conventions might be more on topic at http://answers.semanticweb.com, and you'll probably get better answers if you ask there.

Making your IRIs locally dereferenceable

As phrased, you're asking for a tool, and that kind of question is off topic for StackOverflow:

Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

Again, you might have better luck on http://answers.semanticweb.com. The short answer though, is that you'd be looking for a lightweight webserver. You might even get by by having a web server forward a request for an IRI to a SPARQL describe query asking for information about the IRI. That way, when you request:

http://localhost/running/12345

You'd get back the results of

describe <http://localhost/running/12345>