0
votes

I have been trying solving this issue since days.

I want to do a START query against full-text, ordered by relevance, so to paginate results.

Gladly, I finally found this thread on full-text indexing and neo (and using python as driver).

[https://groups.google.com/forum/#!topic/neo4j/9G8fcjVuuLw]

I had imported my db with batch super-importer, and got a reply of @Michaelhunger who kindly noticed there was a bug, all scores would had been imported the same value.

So, now I am recreating the index, and checking the score via REST (&order=score)

http://localhost:7474/db/data/index/node/myInde?query=name:myKeyWord&order=score

and noticed that entries have still the same score.

(You've got to do an ajax query to see it cause if you use the web console you won't see all data!!)

My code to recreate a full-text lucene index, having each node property 'name': (here using neo4j-rest-client, but I will try also with py2neo as in the Google discussion):

from neo4jrestclient.client import GraphDatabase
gdb = GraphDatabase("http://localhost:7474/db/data/")

myIndex =  gdb.nodes.indexes.create("myIndex", type="fulltext", provider="lucene")

myIndex.add("name",node.get("name"),node)

results:

http://localhost:7474/db/data/index/node/myInde?query=name:DNA&order=score

data Object {id: 17062920, name: "DNA damage theory of aging"}
VM995:10 **score 11.097855567932129**
...
data Object {id: 17022698, name: "DNA (film)"}
VM995:10 **score 11.097855567932129**

In the documentation: [http://neo4j.com/docs/stable/indexing-lucene-extras.html#indexing-lucene-sort] it is written that Lucene does the sorting itself very well, so I understood it creates a ranking by itself in import; it does not.

What am I doing wrong or missing?

2

2 Answers

1
votes

I believe the issue you are seeing is related to a combination of the text you are indexing, the query term(s) and as Michael Hunger pointed out the current lucene configuration in Neo4j which has OMITNORMS=true. With this setting a lucene query, as in your posted examples, where there is text of different size but the query term appears once in each document often results in the same lucene relevancy score. The reason is that the size/length of the document being indexed (field length normalization) is NOT taken into account when OMITNORMS is true.

Looking at your examples it is not clear what your expected results are. For example, are you expecting documents with shorter text to appear first?

In my own experience using lucene and Neo4j I have seen many instances where the relevancy scores being returned are different across different queries.

0
votes

The goal of my question is to obtain a list of results ordered by relevance of nodes' names matching the queried keywords.

@mfkilgore point out this work-around:

start n=node:topic('name:(keyword1* AND keyword2*)') MATCH (n)  with n order by length(split(n.name," ")) asc limit 20 return n

This workaround counts the chars in a node's name, and then order by length of string.