1
votes

We are using Solr for it's full text search capability, lets say we are indexing the text of various news articles.

Searching through all of the articles is as simple as simple can be; however, users can 'like' articles they find interesting.

I am attempting to implement a feature where each user can search through their 'like history.'

I have come up with several possible methods of doing this, but I do not how to practically implement any of them, if they are even possible to implement and have absolutely no idea which would be the best in terms of performance and efficiency.

1) The first method I have come up with is to use a separate MySQL database in which each row holds the id of the user and the article liked by the user.

A query can be made to the MySQL table to return the article id's liked by any user, but how would one go about narrowing Solr's search results to only return articles with the ids retrieve from the MySQL database?

2) The only other way I could figure out would be to create a duplicate document in another Solr core with an added user_id field each time a user likes an article; however, if 100,000 or so users each like 100-1,000 articles, this would consume an unnecessary amount of storage space.

Another problem with this second method is that if the text of the original article is changed, updating each related document for each user who liked the article becomes another cumbersome issue that must be dealt with.

3) The same idea as the 2nd method, except instead of creating duplicate documents have the document containing the 'like' information link to the document's index containing the 'liked' article.

The 2nd method is the only one of the 3 that I know can be done and know how to implement, but it seems wasteful storage-wise and performance-wise anytime an article needs to be updated, which happens quite frequently.

By my logic, the third and first method seem to be the superior ways, in that order, if the y are possible to implement, but I definitely could be wrong. If they are possible to implement and /are/ the best methods, can you explain how to implement them, and if not, do you think that using a second Solr core as described in method 2 would be worth the extra storage space required and the mass re-indexing needed when an article's text changes?

Are there any better alternatives of doing something of this nature? I am not limited to using Solr, I just thought it would be the better to use over relational databases since it is intended for full-text indexing.

Thanks a head of time for any light you can shed on my issue.

Update: Solr's ExternalFileField found in the answers of aitchnyu's question seems promising. If they have a field to index external files, it would make sense that there is a way to link the indexes of one document to another.

1
A similar question of mine : stackoverflow.com/questions/8411860/…Jesvin Jose

1 Answers

0
votes

I would go with the first option. Run your SQL query, then your Solr query - but with the filter query (fq) parameter set to the list of IDs retrieved from the database. Filter queries are used to extract a subset of returned search results - in your case, you only need those documents that occur in a specific user's like history.