Given a search engine like Lucene and a set of XML documents which need to be fully preserved, what are the advantages and disadvantages of using the search engine as key value store for returning XML doucments given a unique primary key which each document contains?
3 Answers
If you use something like Compass, and it's XML-to-Lucene mapping engine, it's a great solution for storing and querying XML documents, without going all the way to a XML database.
One downside is that the XML documents can only be retrieved via the Lucene API (the underlying data store is pretty impenetrable), but I can live with that.
Read Search Engine versus DBMS. IMO, your application falls in the DBMS realm, and will probably be best served by a key-value database, such as couchDB. This is because you take no advantage of textual operations such as tokenization, stemming etc.
If all you are going to do is test for key equality and retrieve a blob, Lucene has no visible advantage over, say, bdb. And you have no transactions until you layer something else on top. And concurrency has certain complexities to it. And the API is, well, a bit baroque for the simple thing you are doing.
I've implemented something like what you describe, but actual full text search on the data was a critical requirement that justified the rest.