What is the most efficient way to perform indexation for full-text search?
I use mongodb, but i think this is not so important in the context of this question
I'm thinking about two ways to store draft.js output with further indexing:
- Convert it to markdown. This looks simple, full-text search indexation is smart enough to filter out all of the garbage characters during indexing. However, if markdown would be so good, draft.js would likely output data as markdown instead of the blocks structure. Therefore, i think there must be advantages from storing blocks as it is.
- Store blocks after
JSON.stringify
and store all of the "text" properties of each block as plain text in separated document property (or table column for sql). So, the plain text will be there only for indexing and the rest of the job will be done by stringified/parsed JSON. Sounds unnecessarily complicated, being honest.
If you guys have already encountered this situation, may be you have some precise advices on how to store and index this data?
Here some examples to be specific:
Example of text:
<p>First line of text</p>
<h1>A header</h1>
<p>text and one <strong>BOLD</strong> word</p>
Draft.js output:
{
"entityMap":{
},
"blocks":[
{
"key":"4vno8",
"text":"First line of text",
"type":"unstyled",
"depth":0,
"inlineStyleRanges":[
],
"entityRanges":[
],
"data":{
}
},
{
"key":"dr3c5",
"text":"A header",
"type":"header-one",
"depth":0,
"inlineStyleRanges":[
],
"entityRanges":[
],
"data":{
}
},
{
"key":"c5ndf",
"text":"text and one BOLD word",
"type":"unstyled",
"depth":0,
"inlineStyleRanges":[
{
"offset":13,
"length":4,
"style":"BOLD"
}
],
"entityRanges":[
],
"data":{
}
}
]
}