0
votes

So the basic situation is that I have to store some text information in a SQL Server database. The problem is that the text must be encrypted AND searchable. I've done some research about the topic and it seems that this is not yet feasible (as far as I understand, this type of encryption technique, homomorphic encryption is not mature enough to be used in real life applications).

I've come up with an idea, but I'm not sure if it is feasible at all. Can someone advise?

So I know that Lucene.NET can be used for full-text searching. What I'd like to do is index the plain text with Lucene.NET, store the index but not the plain text, store the encrypted value in SQL Server, then search the Lucene index and return the ID of the record from the Lucene index, read that specific row from SQL Server and then decrypt the data.

Is this possible? Can I index something with Lucene and then not store the indexed data?

2

2 Answers

2
votes

Not sure I understand the nuances of your requirement... But it sounds like the terms can be plain but the full string needs to be encrypted?

If so a pattern I've used many times is to index the field(s) with Field.STORE.NO, then put the content in a binary field. I've used this to create a document store where the docs are typed structured objects. Define which properties to index, then serialize the object as compressed json into a binary field.

In your case the binary would be the your string encrypted by whatever means you require.

There is still a risk that the text can (mostly) be reconstituted if the text is indexed with vectors (which means the term position is included) which is required for slop queries (ie "fred wilma"~5 = "fred" within 5 terms of "wilma"). "mostly" because there will be no stop words. If you don't need slop then you can index without.

With a little care (probably a custom analyzer and query parser) you could also encrypt the terms too.

0
votes

It's not going to be easy. What you're describing is rarely implemented in commercial databases, although there are some theoretical results in the field. I'd suggest that you go to google scholar and start looking for papers on the subject.

Here are a few references to get you started: