1
votes

My question is motivated by the following problem. I have a set of web documents from which I extract keywords. I want to store these data in Neo4j for further analysis (more or less graph mining including subgraph isomorphism problem): each web document is a node; hyperlink from one web document to another is a corresponding directed relationship; keywords are properties of the nodes. In this setting, keyword property may be attributed to several nodes (I hope this is doable).

I need help with the following questions (which I find quite difficult to answer knowing only very basic things about Neo4j):

1) Is it possible to select all nodes attributed with a specific property "keyword1"?

2) How can I select common (overlapping) keyword properties for 2 nodes "doc1" and "doc2"? i.e., common keywords for 2 web documents

3) Is it better to create some kind of string key for keyword properties (rather then use default auto-incremented integer)?

Any hints/recommendations/links will be highly appreciated. I am using Python binding for Neo4j on Windows.

1

1 Answers

1
votes

global lookups are handled with indexes. You should probably build an index that holds both keywords, backed by lucene, that you then can ask combined queries on the nodes.

http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-index.html

http://docs.neo4j.org/chunked/snapshot/rest-api-indexes.html