1
votes

There is my entity table and tags bound to it through another table (many-to-many relation).

I have an entity selected and my goal is to find a set of entities which have as much similar tags with it as possible. This set must be ordered by 'similarity' – amount of tags each entity shares with chosen entity. Tags are similar if they have the same ids.

I'm wondering if there's elegant and fast way to do that with single query.

The only solution I see now is to fetch all the tag-entity relations and compute their similarity in my application, and then make another database query to select what i've computed, but it doesn't look very graceful.

Database structure:

entity id ...

tag id name

entity_tag entity_id tag_id

Update: final solution for MySQL.

So I have tables for paintings, tags and painting_tag relation. This query fetches similar paintings and their 'similarity index' for previously selected painting.

SELECT site_painting.*, Count(tr.tag_id) as similarity
From site_painting_tag_relation as tr
Inner Join site_painting_tag_relation as tr2 ON ( tr2.tag_id = tr.tag_id and tr2.painting_id = :id )
Left join site_painting on site_painting.id=tr.painting_id
Where tr.painting_id <> :id
Group By  tr.painting_id
Having Count(*) > 0
Order By  Count(*) DESC, tr.painting_id limit 1
1
Every tblTAG row will have a computable Similarity, from 100% to 0%, in relation to any tblEntity row. The Similarity% could be stored with the tblMany2Many and a secondary index created on that %. Then let a loop walk down the %-s starting at 100%, to 99% to 95% to 90% to 80% (in some stepping fashion) until it retrieves a desired quantity of rows.donPablo
@donPablo Sorry, but this is not what i want; may be i haven't properly described the question. Tags themselves don't need to be compared. They're considered similar if they have the same ids. So I want to select records which are bound to the same tag ids as my main record.Anton Sergeyev
I think you need to explain more about what should be returned in terms of % of similarity? So if entity A and had tags X and Y and entity B had tag Y and Z, would you expect to see B returned from A since it has at least one matching tag?Chris Moutray
@ChrisMoutray exactly. Also, if we have entity C with both tags X and Y (and maybe Z also), it should be selected with higher priority than record B.Anton Sergeyev
Ok so the select returns all entities with 1 or more matches ordered most to least matches..Chris Moutray

1 Answers

2
votes

OK, The info on db structure helps a lot --

entity id ...

entity_tag entity_id tag_id

tag id name

Lets look at some example values--

entity
  id=100
  id=...
  id=199

entity_tag
  100, 3
  100, 5
  101, 1
  102, 7
  ...
  199, 3
  199, 7

tag
  id=1
  id=...
  id=10

So if we denormalize the entity_tag we have

  100   3,5
  101  1
  102       7
  199   3   7

And a similarity index of 199 to the rest, on a scale of 0 to 10

  100   1 in common
  101   0 in common
  102   1 in common
  199 self, no comparison

And if I have it right, we want to display 100 and 102 being the highest, nu??

Here is a stab at the SQL-- It might be something like ---

SELECT  TOP 10
FROM
(SELECT
   allET.EID,
   Count(*) as Similarity


 From entity_tag as allET

 Left Join 
      (Select * From  entity_tag  Where EID = myEID ) as myET
 On allET.TID = myET.TID

 Where allET.EID <> myEID

 Group By  allET.EID

 Having Count(*) > 0

 Order By  Count(*) DESC, allET.EID
)