0
votes

I have a requirement for a very specific Lucene implementation which stores multiple "Properties" fields with deserialized JSON strings.

Example:

Document:
ID: "99"
Text: "Lorepsum Ipsum"
Properties: "{
    "lastModified": "1/2/2015",
    "user": "johndoe",
    "modifiedChars": 2,
    "before": "text a",
    "after": "text b",
}"
Properties:"{
    "lastModified": "1/2/2013",
    "user": "johncotton",
    "modifiedChars": 6,
    "before": "text aa",
    "after": "text bbb",
}"
Properties: "{
    "lastModified": "1/3/2015",
    "user": "johnmajor",
    "modifiedChars": 3,
    "before": "text aa",
    "after": "text b",
}"

I'm aware that ElasticSearch and Solr have implementations to lookup within JSON objects but I'm using Lucene's core API (3.0.5).

My goal is to use lucene's API and with some added implementation to search within the JSON strings, for example:

Building a type of BooleanQuery where at least one "Properties" Field MUST match all the values in the query. (e.g query "+user:tom +modifiedChars:3 +before:"text A", etc)

I have some ideas but I have no clue where to begin. What I'm asking is some high level ideas to achieve such implementation. A custom analyzer maybe to use with a query parser? Consider it an open ended question. All suggestions are welcome.

1

1 Answers

0
votes

If you will always search for the complete set of values...

Create a "property" field for each set. The value would just be the concatenated set of values ie "1/2/2015:johndoe:2:text a:text b".

Alternatively... create a separate doc for each set. This would allow you to search for different combinations of values without conflating the different sets.

Yes that might mean duplicating the Text field. If it's not big then I wouldn't care too much (especially if you're not using a "stored" field).

Do you need to need to combine text and property in your queries? ("text:ipsum AND property:xxx")

If not then put the text in yet another doc.

If the idea is to search in order to get the "ID" field then some combination of the above ought to work