I use elasticsearch to create a program allowing to find all the places in a text where the bible is quoted as well as the place where is the verse mentioned I indexed all the verses of the bible in elasticsearch, each verse is a document When I do a search by partially typing a verse, I find the right result (even by making mistakes) How to browse the text to find all the occurrences where a verse (even partial) is cited and thus attribute the source of the verse to them? and tolerating faults (with the fuzziness parameter or using synonyms I think)
Example of my index :
{"index":{"_index":"test","_type":"","_id":1}}
{"fields":{"year":3560,"book":"1","chapter":1,"section":1,"text":"others words consectetur adipiscing and others words"},"id":"test1","type":"add"}
{"index":{"_index":"test","_type":"","_id":2}}
{"fields":{"year":3560,"book":"2","chapter":3,"section":2,"text":"others words a sagittis nisl quam and others words"},"id":"test2","type":"add"}
{"index":{"_index":"test","_type":"","_id":3}}
{"fields":{"year":3560,"book":"3","chapter":1,"section":5,"text":"others words Aliquam ultrices auctor pharetra and others words"},"id":"test3","type":"add"}
{"index":{"_index":"test","_type":"","_id":4}}
{"fields":{"year":3560,"book":"4","chapter":2,"section":4,"text":"others words Proin ut vestibulum and others words"},"id":"test4","type":"add"}
{"index":{"_index":"test","_type":"","_id":5}}
{"fields":{"year":3560,"book":"5","chapter":1,"section":5,"text":"others words Aenean pretium tincidunt aliquet and others words"},"id":"test5","type":"add"}
{"index":{"_index":"test","_type":"","_id":6}}
{"fields":{"year":3560,"book":"6","chapter":2,"section":1,"text":"others words In vitae sagittis and others words"},"id":"test6","type":"add"}
{"index":{"_index":"test","_type":"","_id":7}}
{"fields":{"year":3560,"book":"7","chapter":7,"section":7,"text":"others words ligula laoreet pharetra and others words"},"id":"test7","type":"add"}
{"index":{"_index":"test","_type":"","_id":8}}
{"fields":{"year":3560,"book":"8","chapter":1,"section":4,"text":"others words luctus eros a pretium and others words"},"id":"test8","type":"add"}
{"index":{"_index":"test","_type":"","_id":9}}
{"fields":{"year":3560,"book":"9","chapter":1,"section":7,"text":"others words ullamcorper eu id quam and others words"},"id":"test9","type":"add"}
{"index":{"_index":"test","_type":"","_id":10}}
{"fields":{"year":3560,"book":"10","chapter":5,"section":4,"text":"others words Nullam ac enim ac lacus hendrerit and others words"},"id":"test10","type":"add"}
I need to find all the occurrences in the paragraph which are in the index, in order to recover their sources :
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla rhoncus, nulla vitae porta euismod, purus nisl faucibus nunc, a sagittis nisl quam id arcu. Sed sit amet arcu sed dui auctor bibendum. Proin ut vestibulum sem, id rutrum felis. Phasellus sagittis justo sit amet justo consequat, id scelerisque eros cursus. Quisque dapibus finibus euismod. Proin dui urna, auctor ut gravida quis, fringilla quis velit. Donec sed pulvinar leo. Sed pulvinar pharetra arcu nec egestas. Mauris non dapibus diam. Pellentesque quis pellentesque libero. Aliquam ultrices auctor pharetra. Cras ullamcorper, odio sit amet aliquam convallis, magna nibh gravida nunc, sit amet volutpat elit purus eget lectus. Pellentesque eu est a risus euismod consequat. Duis id erat porttitor, sodales justo non, aliquet ex. Etiam tincidunt neque ut nisi commodo auctor. Sed congue urna ac tellus scelerisque hendrerit. Mauris lobortis sed dui ut varius. Proin ac luctus felis. In vitae sagittis erat, nec luctus sapien. Aenean pretium tincidunt aliquet. Morbi at enim vel ligula laoreet pharetra. Sed dignissim luctus eros a pretium. Vestibulum molestie molestie nisi, vitae scelerisque nibh bibendum nec. Donec laoreet sapien sed vehicula dictum. Nullam ac enim ac lacus hendrerit tempor et vitae neque. Quisque at leo pretium, efficitur augue vitae, congue eros. Maecenas volutpat ante nec scelerisque vestibulum. Donec tristique orci erat, nec imperdiet nulla commodo ut. Nam non odio vel quam cursus ullamcorper eu id quam. Duis volutpat, nisl eu interdum mattis, augue ipsum mollis leo, eget efficitur orci augue eget leo. Integer feugiat facilisis dolor ut vehicula. Maecenas quis feugiat massa. Curabitur feugiat odio eget ligula tincidunt sodales. Donec feugiat dapibus lectus, non maximus dui rhoncus vitae. Phasellus eget massa faucibus, tristique nibh sed, aliquet metus.
I do not know if I have been clear enough but do not hesitate to ask me if you need more precision
I think this problem is handled by the Aho-Corasick algorithm but I don't know how to integrate it into elasticsearch
Thank you!