I have been trying to explore RDF Triple Store feature and Semantic Search capabilities of Marklogic 7 and then querying using SPARQL. I was able to perform some basics operations on such as:
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"at"/MarkLogic/semantics.xqy";
sem:rdf-insert(sem:triple(sem:iri("http://example.org/ns/people#m"),
sem:iri("http://example.com/ns/person#firstName"), "Sam"),(),(),"my collection")
which creates a triple, and then query it using the following SPARQL:
PREFIX ab: <http://example.org/ns/people#>
PREFIX ac: <http://example.com/ns/person#>
SELECT ?Name
WHERE
{ ab:m ac:firstName ?Name . }
which retrieves Sam as result. Edited: In my use case, I have a delimited file (Structured data) having 1 Billion records that I ingested into ML using MLCP which is stored in ML for instance as:
<root>
<ID>1000-000-000--000</ID>
<ACCOUNT_NUM>9999</ACCOUNT_NUM>
<NAME>Vronik</NAME>
<ADD1>D7-701</ADD1>
<ADD2>B-Valentine</ADD2>
<ADD3>Street 4</ADD3>
<ADD4>Fifth Avenue</ADD4>
<CITY>New York</CITY>
<STATE>NY</STATE>
<HOMPHONE>0002600000</HOMPHONE>
<BASEPHONE>12345</BASEPHONE>
<CELLPHONE>54321</CELLPHONE>
<EMAIL_ADDR>[email protected]</EMAIL_ADDR>
<CURRENT_BALANCE>10000</CURRENT_BALANCE>
<OWNERSHIP>JOINT</OWNERSHIP>
</root>
Now, I want to use RDF/Semantic feature for my dataset above.
However, I am not able to understand whether I need to convert the above doc to RDF as shown below (shown for <NAME>
) assuming this to be a right way:
<sem:triple>
<sem:subject>unique/uri/Person
</sem:subject>
<sem:predicate>unique/uri/Name
</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string"
xml:lang="en">Vronik
</sem:object>
</sem:triple>
and then ingest these docs in ML and search using SPARQL, or do I need to just ingest my documents and then separately ingest triples obtained from external sources and somehow (how..??) link them to my documents and then query using SPARQL? Or is there some other way that I ought to do this?
<http://Shrey.com/xml-doc-1000-000-000--000> :id "1000-000-000--000" ; :accountNum "9999"^^xsd:int ; :name "Vronik" ; :add1 "D7-701" ; ... ; :ownership :JOINT .
– Joshua Taylorsem:triple
schema, which is how MarkLogic stores triples. It can read RDF-XML, NTriple, N3, etc. via docs.marklogic.com/sem:rdf-parse - but it isn't clear that Shrey needs that. – mblakelesem:triple
is my understanding, is this the right way my original doc should be converted to and then ingested?I would like to perform bulk load/transform as I have around a billion records – Shrey Shivam