I migrated a MySQL database to Neo4j and tested a simple request. I was very surprised to see that an equivalent request was 10 to 100 times longer in neo4j than in MySql. I'm working on Neo4j 2.0.1.
In the original MySql schema I have the three following tables :
- countries : containing a 'code', a 'continent_id' and a 'selected' boolean,
- cities : containing a 'country_code', a 'name' and a 'status' boolean,
- theaters : containing a 'city_id' and a 'public' boolean,
with an indexes on each property. I want to display the theater count by city for a given continent with several conditions. The request is :
SELECT count(*) as nb, c.name
FROM `cities` c LEFT JOIN theaters t ON c.id = t.city_id
WHERE c.country_code IN
(SELECT code FROM countries WHERE selected is true AND continent_id = 4)
AND c.status=1 AND t.public = 1
GROUP BY c.name ORDER BY nb DESC
The database schema in Neo4j is the following :
(:Continent)-[:Include]->(:Country{selected:bool})-[:Include]->(:City{name:string, status:bool})-[:Include]->(:Theater{public:bool})
There's also an index defined on each property. The Cypher request is :
MATCH (:Continent{code: 4})-[:Include]->(:Country{selected:true})-[:Include]->(city:City{status:true})-[:Include]->(:Theater{public: true})
RETURN city.name, count(*) AS nb ORDER BY nb DESC
There are about 70.000 cities and 140.000 theaters in each database.
On continent with id 4 the MySql request took about 0.02s whereas Neo4j took 0.4s. Moreover, if I introduce in the Cypher request a variable relation length between Country and City (...(:Country{selected:true})-[:Include*..3]->(city:City{status:true})...
) as I want to be able to add intermediate levels like Regions, there the request take more than 2 seconds.
I'm aware that in this particular case there is no benefit to use Neo4j instead of MySql, but I would expect to see approximatively equivalent performance between the two technologies, and I want to take advantage of Neo4j geographical hierarchy capabilities.
Am I missing something or is this a limitation of Neo4j?
Thank you for your answers.
Edit : First you'll find database dump files here. The Neo4j server configuration is the one out of box. I work in a Ruby environment, and I use the neography gem. Also I run the Neo4J server separatly since I'm not on JRuby, so it sends the cypher request through the Rest API.
The database contains 244 countries, 69000 cities and 138,000 theaters. For continent_id 4, there are 46,982 cities (37,210 have the status boolean set to true) and 74,420 theaters.
The request returned 2256 rows. At the third run, it took 338 ms. Here is the request output with the profiling infos :
profile MATCH (:Continent{code: 4})-[:Include]->(country:Country{selected:true})-[:Include*..1]->(city:City{status:true})-[:Include]->(theater:Theater{public: true}) RETURN city.name, count(*) AS nb ORDER BY nb DESC;
==> ColumnFilter(symKeys=["city.name", " INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2"], returnItemNames=["city.name", "nb"], _rows=2256, _db_hits=0)
==> Sort(descr=["SortItem(Cached( INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2 of type Integer),false)"], _rows=2256, _db_hits=0)
==> EagerAggregation(keys=["Cached(city.name of type Any)"], aggregates=["( INTERNAL_AGGREGATE85ca19f3-9421-4c18-a449-1097e3deede2,CountStar())"], _rows=2256, _db_hits=0)
==> Extract(symKeys=["city", " UNNAMED27", " UNNAMED7", "country", " UNNAMED113", "theater", " UNNAMED72"], exprKeys=["city.name"], _rows=2257, _db_hits=2257)
==> Filter(pred="(hasLabel(theater:Theater(3)) AND Property(theater,public(5)) == true)", _rows=2257, _db_hits=2257)
==> SimplePatternMatcher(g="(city)-[' UNNAMED113']-(theater)", _rows=2257, _db_hits=4514)
==> Filter(pred="(((hasLabel(city:City(2)) AND hasLabel(city:City(2))) AND Property(city,status(4)) == true) AND Property(city,status(4)) == true)", _rows=2257, _db_hits=74420)
==> TraversalMatcher(start={"label": "Continent", "query": "Literal(4)", "identifiers": [" UNNAMED7"], "property": "code", "producer": "SchemaIndex"}, trail="( UNNAMED7)-[ UNNAMED27:Include WHERE (((hasLabel(NodeIdentifier():Country(1)) AND hasLabel(NodeIdentifier():Country(1))) AND Property(NodeIdentifier(),selected(3)) == true) AND Property(NodeIdentifier(),selected(3)) == true) AND true]->(country)-[:Include*1..1]->(city)", _rows=37210, _db_hits=37432)