I have a requirement for a very specific Lucene implementation which stores multiple "Properties" fields with deserialized JSON strings.
Example:
Document:
ID: "99"
Text: "Lorepsum Ipsum"
Properties: "{
"lastModified": "1/2/2015",
"user": "johndoe",
"modifiedChars": 2,
"before": "text a",
"after": "text b",
}"
Properties:"{
"lastModified": "1/2/2013",
"user": "johncotton",
"modifiedChars": 6,
"before": "text aa",
"after": "text bbb",
}"
Properties: "{
"lastModified": "1/3/2015",
"user": "johnmajor",
"modifiedChars": 3,
"before": "text aa",
"after": "text b",
}"
I'm aware that ElasticSearch and Solr have implementations to lookup within JSON objects but I'm using Lucene's core API (3.0.5).
My goal is to use lucene's API and with some added implementation to search within the JSON strings, for example:
Building a type of BooleanQuery where at least one "Properties" Field MUST match all the values in the query. (e.g query "+user:tom +modifiedChars:3 +before:"text A", etc)
I have some ideas but I have no clue where to begin. What I'm asking is some high level ideas to achieve such implementation. A custom analyzer maybe to use with a query parser? Consider it an open ended question. All suggestions are welcome.