1
votes

I have an ontology, which I read in with Jena to help me scrape some RDFa triples from a website. I don't currently store these triples in a Jena model, but that is fairly straight forward to do, its on my to do next list.

The area I am struggling with, though, is to get Jena to output correct RDF for the ontology I have. The ontology uses Owl and RDFS definitions, but when I pass some example triples into the model, they don't appear correctly. Almost as if it doesn't know anything about the ontology. The output is, however, still valid RDF, just it's not coming out in the form I was hoping for.

Am I correct in thinking that Jena should be able to produce well written RDF (not just valid) about the triples I have collected, based on the ontology or does this out stretch what it is capable of?

Many thanks for any input.

Update 1

Examples:

This is what we currently have:

<rdf:Description rdf:about='http://theinternet.com/%3fq=Club/325'>
        <j.0:hasName>Manchester United</j.0:hasName>
       <j.0:hasPlayer>
             <rdf:Description rdf:about='http://theinternet.com/%3fq=player/291/'>
             </rdf:Description>
       </j.0:hasPlayer>
       <j.0:hasEmblem>http://theinternet.com/images/manutd.jpg</j.0:hasEmblem>
       <j.0:hasWebsite>http://www.manutd.com/</j.0:hasWebsite>
</rdf:Description>

</rdf:RDF>

This is what we ideally want:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xmlns:owl="http://www.w3.org/2002/07/owl#"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
      xmlns:ontology="http://theinternet.com/ontology.rdf#">

<rdf:Description rdf:about='http://theinternet.com/%3fq=Club/325'>
<rdf:type rdf:resource='ontology:Club' />
       <ontology:hasName>Manchester United</ontology:hasName>
       <ontology:hasPlayer>
             <rdf:Description rdf:about='http://theinternet.com/%3fq=player/291/'>
                 <rdf:type rdf:resource='ontology:Player' />
             </rdf:Description>
       </ontology:hasPlayer>
       <ontology:hasEmblem>http://theinternet.com/images/manutd.jpg</ontology:hasEmblem>
       <ontology:hasWebsite>http://www.manutd.com/</ontology:hasWebsite>
</rdf:Description>

</rdf:RDF>

To me it just looks like Jena is missing things to do with the ontology, such as the resource types etc. I have this feeling I'm using Jena wrongly.

2

2 Answers

5
votes

If you want well written rdf (xml I assume) use the RDF/XML-ABBREV writer. The default is usually fine, however you will find tuning instructions here.

Without an example of the problem output it's difficult to know what you problem is. Are you seeing things like <j.0:SomeClass>? That's a prefix issue. If they are defined in the original RDFa document then you've lost them somehow, but it ought to be easy to fix. Otherwise you can set them manually on the model using the methods in PrefixMapping (which Model extends).

Updated answer

Thanks for the example. Prefixes are the main issue here.

model.setNsPrefix("ontology", "http://theinternet.com/ontology.rdf#");
model.setNsPrefix("dc",   DC_11.NS);
model.setNsPrefix("owl",  OWL.NS);
model.setNsPrefix("rdfs", RDFS.NS);
model.setNsPrefix("xsd",  XSD.NS);

(DC_11.NS et al are defined in the the jena vocabulary package)

Note that rdf:resource (like rdf:about) takes a full URI, so

<rdf:type rdf:resource='ontology:Club' />

does not work. Using the showDoctypeDeclaration option will abbreviate using XML entities.

Incidentally, which RDFa parser did you use? The prefix definitions ought to pass through.

1
votes

You are missing the rdf:type properties because you haven't loaded any ontology containing the required rdfs:domain or rdfs:range statements and I don't think you've used any reasoner to make these inferences.

You can load the domain or range statements along with the rest of the data or jena has a facility for automatically loading an ontology when it sees and owl:imports statement. I'd suggest the former to keep things simple.

The jena RdfsInferencer documented here http://jena.sourceforge.net/inference/ will do the reasoning you want.

btw, I've found sesame to be a lot easier to use and more robust than jena for large scale stuff although for scraping a few triples either would be fine.

bbtw, Turtle (a subset of N3) is much easier to read and edit than RDF/XML. It's well worth learning. I've been working with rdf constantly for the last 3 years and now convert all RDF/XML to Turtle before dealing with any raw data (although I do have a nice tool that writes everything in a useful order and automatically inserts backreference comments etc.)

good luck