1
votes

I have a database of tweet nodes and each tweet has a userId field with relationships based on if the tweet was replied to. I am trying to write a query where I group all of the tweets by user while preserving tweet relationships to see the relationship between users.

So far I have

match (n:Tweet) return distinct(n.userId), n

but this does not work because the relationships are not preserved. Does anybody know how to do this?

2
What is your data model? Show the nodes (and node labels), the relationship types, and how the nodes are connected by those relationship types.cybersam

2 Answers

1
votes

I'm not quite sure what you're asking... You're talking about relationships but you haven't matched on any.

Speaking generally when you use an aggregate function (such as sum, count, collect, etc...) in a RETURN (or a WITH) clause Neo4j will automatically group by the other columns in the clause. Here's an example of something that you might do:

MATCH (source_tweet:Tweet {userId: 1234})<-[:RETWEET_OF]-(retweet:Tweet)
RETURN source_tweet:Tweet.id, source_tweet:Tweet.text, count(retweet)

This will give you one line for each tweet userId #1234 has made and a count of the number of retweets for each of those tweets.

1
votes

Adding another answer in response to your comment.

So if you want to find out if users have had a conversation you might do:

MATCH (source_tweet:Tweet)<-[:REPLY_TO]-(reply_tweet:Tweet)
MATCH
  (source_user:User {userId: source_tweet.user}),
  (reply_user:User {userId: reply_tweet.user})
CREATE UNIQUE reply_user-[:REPLIED_TO]->(source_user)

Then you could do:

MATCH (user1)-[:REPLIED_TO]-(user2)
WHERE ID(user1) < ID(user2)
RETURN user1.userId, user2.user_id

... to get all of the combinations.

Though like I said, you should really have relationships between the tweets and users indicating which user made the tweet. If you had that the query might be like:

MATCH (source_user:User)-[:WROTE]->(source_tweet:Tweet)<-[:REPLY_TO]-(reply_tweet:Tweet)<-[:WROTE]-(reply_user:User)
CREATE reply_user-[:REPLIED_TO]->(source_user)

And that should also be much faster because you're doing relationship traversals rather than looking up in the userId index.