7
votes

I've got an association that needs a few joins / custom queries. When trying to figure out how to implement this the repeated response is finder_sql. However in Rails 4.2 (and above):

ArgumentError: Unknown key: :finder_sql

My query to do the join looks like this:

'SELECT DISTINCT "tags".*' \
' FROM "tags"' \
' JOIN "articles_tags" ON "articles_tags"."tag_id" = "tags"."id"' \
' JOIN "articles" ON "article_tags"."article_id" = "articles"."id"' \
' WHERE articles"."user_id" = #{id}'

I understand that this can be achieved via:

has_many :tags, through: :articles

However if the cardinality of the join is large (i.e. a user has thousands of articles - but the system only has a few tags) it requires loading all the articles / tags:

SELECT * FROM articles WHERE user_id IN (1,2,...)
SELECT * FROM article_tags WHERE article_id IN (1,2,3...) -- a lot
SELECT * FROM tags WHERE id IN (1,2,3) -- a few

And of course also curious about the general case.

Note: also tried using the proc syntax but can't seem to figure that out:

has_many :tags, -> (user) {
  select('DISTINCT "tags".*')
    .joins('JOIN "articles_tags" ON "articles_tags"."tag_id" = "tags"."id"')
    .joins('JOIN "articles" ON "article_tags"."article_id" = "articles"."id"')
    .where('"articles"."user_id" = ?', user.id)
}, class_name: "Tag"

ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR: column tags.user_id does not exist

SELECT DISTINCT "tags".* FROM "tags" JOIN "articles_tags" ON "articles_tags"."tag_id" = "tags"."id" JOIN "articles" ON "article_tags"."article_id" = "articles"."id" WHERE "tags"."user_id" = $1 AND ("articles"."user_id" = 1)

That is it looks like it is trying to inject the user_id onto tags automatically (and that column only exists on articles). Note: I'm preloading for multiple users so can't use user.tags without other fixes (the SQL pasted is what I'm seeing using exactly that!). Thoughts?

6
where are you writing this has_many :tags, -> (user)... ? In user.rb ? - Sajan
@sajan correct - in user.rb. - Stussa

6 Answers

3
votes

While this doesn't fix your problem directly - if you only need a subset of your data you can potentially preload it via a subselect:

users = User.select('"users".*"').select('COALESCE((SELECT ARRAY_AGG(DISTINCT "tags"."name") ... WHERE "articles"."user_id" = "users"."id"), '{}') AS tag_names')
users.each do |user|
  puts user[:tag_names].join(' ')
end

The above is DB specific for Postgres (due to ARRAY_AGG) but an equivalent solution probably exists for other databases.

An alternative option might be to setup a view as a fake join table (again requires database support):

CREATE OR REPLACE VIEW tags_users AS (
  SELECT 
    "users"."id" AS "user_id", 
    "tags"."id" AS "tag_id"
  FROM "users"
    JOIN "articles" ON "users"."id" = "articles"."user_id"
    JOIN "articles_tags" ON "articles"."id" = "articles_tags"."article_id"
    JOIN "tags" ON "articles_tags"."tag_id" = "tags"."id"
  GROUP BY "user_id", "tag_id"
)

Then you can use has_and_belongs_to_many :tags (haven't tested - may want to set to readonly and can remove some of the joins and use if you have proper foreign key constraints setup).

0
votes

So my guess is you are getting the error when you try to access @user.tags since you have that association inside the user.rb.

So I think what happens is when we try to access the @user.tags, we are trying to fetch the tags of the user and to that rails will search Tags whose user_id matches with currently supplied user's id. Since rails takes association name as modelname_id format by default, even if you don't have user_id it will try to search in that column and it will search (or add WHERE "tags"."user_id") no matter you want it to or not since ultimate goal is to find tags that are belongs to current user.

Of course my answer may not explain it 100%. Feel free to comment your thought or If you find anything wrong, let me know.

0
votes

Short Answer

Ok, if I understand this correctly I think I have the solution, that just uses the core ActiveRecord utilities and does not use finder_sql.

Could potentially use:

user.tags.all.distinct

Or alternatively, in the user model change the has_many tags to

has_many :tags, -> {distinct}, through: :articles

You could create a helper method in user to retrieve this:

def distinct_tags
  self.tags.all.distinct
end

The Proof

From your question I believe you have the following scenario:

  1. A user can have many articles.
  2. An article belongs to a single user.
  3. Tags can belong to many articles.
  4. Articles can have many tags.
  5. You want to retrieve all the distinct tags a user has associated with the articles they have created.

With that in mind I created the following migrations:

class CreateUsers < ActiveRecord::Migration
  def change
    create_table :users do |t|
      t.string :name, limit: 255

      t.timestamps null: false
    end
  end
end

class CreateArticles < ActiveRecord::Migration
  def change
    create_table :articles do |t|
      t.string :name, limit: 255
      t.references :user, index: true, null: false

      t.timestamps null: false
    end

    add_foreign_key :articles, :users
  end
end

class CreateTags < ActiveRecord::Migration
  def change
    create_table :tags do |t|
      t.string :name, limit: 255

      t.timestamps null: false
    end
  end
end

class CreateArticlesTagsJoinTable < ActiveRecord::Migration
  def change
    create_table :articles_tags do |t|
      t.references  :article, index: true, null:false
      t.references  :tag, index: true, null: false
    end

    add_index :articles_tags, [:tag_id, :article_id], unique: true
    add_foreign_key :articles_tags, :articles
    add_foreign_key :articles_tags, :tags
  end
end

And the models:

class User < ActiveRecord::Base
  has_many :articles
  has_many :tags, through: :articles

  def distinct_tags
    self.tags.all.distinct
  end
end

class Article < ActiveRecord::Base
  belongs_to :user
  has_and_belongs_to_many :tags
end

class Tag < ActiveRecord::Base
  has_and_belongs_to_many :articles
end

Next seed the database with a lot of data:

10.times do |tagcount|
  Tag.create(name: "tag #{tagcount+1}")
end

5.times do |usercount|
  user = User.create(name: "user #{usercount+1}")

  1000.times do |articlecount|
    article = Article.new(user: user)
    5.times do |tagcount|
      article.tags << Tag.find(tagcount+usercount+1)
    end
    article.save
  end
end

Finally in rails console:

user = User.find(3)
user.distinct_tags

results in following output:

  Tag Load (0.4ms)  SELECT DISTINCT `tags`.* FROM `tags` INNER JOIN `articles_tags` ON `tags`.`id` = `articles_tags`.`tag_id` INNER JOIN `articles` ON `articles_tags`.`article_id` = `articles`.`id` WHERE `articles`.`user_id` = 3
 => #<ActiveRecord::AssociationRelation [#<Tag id: 3, name: "tag 3", created_at: "2016-10-18 22:00:52", updated_at: "2016-10-18 22:00:52">, #<Tag id: 4, name: "tag 4", created_at: "2016-10-18 22:00:52", updated_at: "2016-10-18 22:00:52">, #<Tag id: 5, name: "tag 5", created_at: "2016-10-18 22:00:52", updated_at: "2016-10-18 22:00:52">, #<Tag id: 6, name: "tag 6", created_at: "2016-10-18 22:00:52", updated_at: "2016-10-18 22:00:52">, #<Tag id: 7, name: "tag 7", created_at: "2016-10-18 22:00:52", updated_at: "2016-10-18 22:00:52">]> 
0
votes

May be it is helpful to use eager_load to force ActiveRecord execute joins. It works as includes(:tags).references(:tags)

Here is a code snippet:

users.eager_load(:tags).map { |user| user.tag.inspect }
# equal to
users.includes(:tags).references(:tags).map { |user| user.tag.inspect }

Where users - is an ActiveRecord relation.

This code will hit a database at least twice:

  1. Select only users ids (hopefully, not too many)
  2. Select users with joins tags through article_tags avoiding

SELECT * FROM article_tags WHERE article_id IN (1,2,3...) -- a lot

0
votes

You are on the right path with has_many :tags, through: :articles (or even better has_many :tags, -> {distinct}, through: :articles as Kevin suggests). But you should read a bit about includes vs preload vs eager_load. You are doing this:

User.preload(:tags).each {|u| ... }

But you should do this:

User.eager_load(:tags).each {|u| ... }

or this:

User.includes(:tags).references(:tags).each {|u| ... }

When I do that I get this query:

SELECT  "users"."id" AS t0_r0,
        "tags"."id" AS t1_r0,
        "tags"."name" AS t1_r1
FROM    "users"
LEFT OUTER JOIN "articles"
ON      "articles"."user_id" = "users"."id"
LEFT OUTER JOIN "articles_tags"
ON      "articles_tags"."article_id" = "articles"."id"
LEFT OUTER JOIN "tags"
ON      "tags"."id" = "articles_tags"."tag_id"

But that is still going to send a lot of redundant stuff from the database to your app. This will be faster:

User.eager_load(:tags).distinct.each {|u| ... }

Giving:

SELECT  DISTINCT "users"."id" AS t0_r0,
        "tags"."id" AS t1_r0,
        "tags"."name" AS t1_r1
FROM    "users"
LEFT OUTER JOIN "articles"
ON      "articles"."user_id" = "users"."id"
LEFT OUTER JOIN "articles_tags"
ON       "articles_tags"."article_id" = "articles"."id"
LEFT OUTER JOIN "tags"
ON "tags"."id" = "articles_tags"."tag_id"

Doing just User.first.tags.map &:name gets me joins too:

SELECT  DISTINCT "tags".*
FROM    "tags"
INNER JOIN "articles_tags"
ON      "tags"."id" = "articles_tags"."tag_id"
INNER JOIN "articles"
ON      "articles_tags"."article_id" = "articles"."id"
WHERE   "articles"."user_id" = ?

For more details, please see this github repo with an rspec test to see what SQL Rails is using.

0
votes

There are three possible solutions:

1) Continue to use has_many associations

Fake user_id column by adding it to the selected columns.

  class User < ActiveRecord::Base
    has_many :tags, -> (user) {
      select(%Q{DISTINCT "tags".*, #{user_id} AS user_id })
        .joins('JOIN "articles_tags" ON "articles_tags"."tag_id" = "tags"."id"')
        .joins('JOIN "articles" ON "article_tags"."article_id" = "articles"."id"')
        .where('"articles"."user_id" = ?', user.id)
    }, class_name: "Tag"
  end

2) Add an instance method on the User class

If you are using tags for queries only and you haven't used it in joins you can use this approach:

class User
  def tags
   select(%Q{DISTINCT "tags".*})
        .joins('JOIN "articles_tags" ON "articles_tags"."tag_id" = "tags"."id"')
        .joins('JOIN "articles" ON "article_tags"."article_id" = "articles"."id"')
        .where('"articles"."user_id" = ?', id)
  end
end

Now user.tags behaves like an association for all practical purposes.

3) OTOH, using EXISTS might be performant than using distinct

  class User < ActiveRecord::Base
    def tags
      exists_sql = %Q{
        SELECT  1
        FROM    articles,
                articles_tags
        WHERE   "articles"."user_id" = #{id} AND
                "articles_tags"."article_id" = "article"."id" AND
                "articles_tags"."tag_id" = "tags.id"
      }
      Tag.where(%Q{ EXISTS ( #{exists_sql} ) })
    end
  end