1
votes

I've recently started porting an existing PHP messenger app to elixir (using elixir 1.3, phoenix 1.2, ecto 2.0.1 and mariaex 0.7.7). This app serves millions of users so performance is important. I'm very new to Elixir so please forgive the dumbness of my question

I have the following database schema:

Database schema

Each thread has multiple thread_participants and messages. A thread_participant has info about the user in the context of the linked thread (when the user has last seen this thread for example). A thread is composed of multiple messages authored by a user.

What I would like the json my API to return in the end to be :

"data": {
  "result": [1, 2],
  "threads": {
    1: {
      "id": 1,
      "unread_count": 2,
      "starred": false,
      "muted": false,
      "last_seen": "2015-10-20T19:01:46",
      "participants": [1, 2]
    },
    22: {
      "id": 22,
      "unread_count": 0,
      "starred": true,
      "muted": false,
      "last_seen": "2016-06-20T12:00:00",
      "participants": [1, 3]
    }
  },
  users: {
    1: {
      id: 1,
      name: 'John'
    },
    2: {
      id: 2,
      name: 'Dan'
    },
    3: {
      id: 3,
      name: 'Eric'
    }        
  }

Here are my schemas for Thread and ThreadParticipant:

schema "thread" do
  field :created, Ecto.DateTime, usec: true, autogenerate: true
  belongs_to :creator, UserAbstract
  has_many :messages, ThreadMessage
  has_many :participants, ThreadParticipant
  has_many :users, through: [:participants, :user]
  field :last_seen, Ecto.DateTime, virtual: true, default: :null
  field :muted, :boolean, virtual: true, default: false
  field :starred, :boolean, virtual: true, default: false
  field :unread_count, :integer, virtual: true, default: 0
end

@primary_key false
schema "thread_participant" do
  belongs_to :thread, Messenger.Thread, primary_key: true
  belongs_to :user, Messenger.UserAbstract, primary_key: true
  field :last_seen, Ecto.DateTime, usec: true, autogenerate: true
  field :starred, :boolean, default: false
  field :muted, :boolean, default: false
end

and I use query composition to contextualize the list of threads for the user:

def for_user(query, user_id) do
  from t in query,
  join: p in assoc(t, :participants),
  join: message in assoc(t, :messages),
  left_join: messageNew in ThreadMessage, on: messageNew.id == message.id and messageNew.created > p.last_seen,
  where: p.user_id == ^user_id,
  order_by: [desc: max(message.created)],
  group_by: t.id,
  select: %{ t | last_seen: p.last_seen, muted: p.muted,starred: p.starred, unread_count: count(messageNew.id)}
end

so when I do

 Thread |> Thread.for_user(user_id) |> Repo.all

I'm a able to get almost all the correct aggregated informations but I miss the participant IDs because of the group_by thread.id.

In pure SQL I would do something like the code below and then rebuild my models in code:

SELECT s.id, s.last_seen, s.muted, s.starred, s.last_message_date, s.unread_count, p.user_id
  FROM (
    SELECT t0.`id` , t2.`last_seen` , t2.`muted` , t2.`starred` , max(t1.`created`) as last_message_date, count(t3.id) as unread_count
    FROM `thread` AS t0
    INNER JOIN `thread_message` AS t1 ON t0.`id` = t1.`thread_id`
    INNER JOIN `thread_participant` AS t2 ON ( t0.`id` = t2.`thread_id` ) AND ( t2.`user_id` = 9854 )
    LEFT JOIN `thread_message` AS t3 ON t3.`id` = t1.`id` AND t3.`created` > t2.`last_seen`
    GROUP BY t0.`id`
  ) as s
  INNER JOIN `thread_participant` AS p ON p.`thread_id` = s.`id`
  ORDER BY s.`last_message_date` DESC

All my attempt to translate this to Ecto (even using subqueries or fragments) have failed (no Max() in subquery, field aliases in the subquery are not kept, ... )

So in addition to the first query (for_user()), I'm load the participants in a second query:

thread_ids = Enum.map(threads, fn (x) -> x.id end)

def get_participating_user(thread_ids) do
  from tp in ThreadParticipant,
  join: user in assoc(tp, :user),
  where: tp.thread_id in ^thread_ids,
  preload: :user
end

participants = Thread.get_participating_user(thread_ids) |> Repo.all

But now I'm stuck with how can I merge the two result sets (put the ThreadParticipants from the second query where they belong in each Thread from the first query under the participants key), and then how can I output it, normalized, in my view (only the participant ID's are kept under thread.participants and all distinct users are outputted under users)

Having been stuck on this for hours, I'd really appreciate any knowledge you could share

1

1 Answers

1
votes

I eventually got everything to work. After many hours reinventing the wheel (i.e loading the thread_participants in a second query and then going through the list of threads to add their participants), I noticed that whatever you put in your first query, ecto will fetch the preloaded associations in a separate query.

So to fix problem 1 ( how can I merge the two result sets), the solution is: Don't do it :-) just mark the desired associations as preloaded. As long as you've got the thread ids loaded in your main query ecto will be happy to do the hard work for you:

  def for_user(query, user_id) do
    from t in query,
    join: p in assoc(t, :participants),
    join: message in assoc(t, :messages),
    join: u in assoc(p, :user),
    left_join: messageNew in ThreadMessage, on: messageNew.id == message.id and messageNew.created > p.last_seen,
    where: p.user_id == ^user_id,
    order_by: [desc: max(message.created)],
    group_by: t.id,
    preload: [:participants,:users],
    select: %{ t | last_seen: p.last_seen, muted: p.muted,starred: p.starred, unread_count: count(messageNew.id)}
  end

In debug mode you can see that ecto do the following queries :

SELECT t0.`id`, t0.`created`, t0.`creator_id`, t1.`last_seen`, t1.`muted`, t1.`starred`, count(t4.`id`) FROM `thread` AS t0 INNER JOIN `thread_participant` AS t1 ON t1.`thread_id` = t0.`id` INNER JOIN `thread_message` AS t2 ON t2.`thread_id` = t0.`id` INNER JOIN `user` AS u3 ON u3.`id` = t1.`user_id` LEFT OUTER JOIN `thread_message` AS t4 ON (t4.`id` = t2.`id`) AND (t4.`created` > t1.`last_seen`) WHERE (t1.`user_id` = ?) GROUP BY t0.`id` ORDER BY max(t2.`created`) DESC LIMIT 5 [20]

SELECT t0.`thread_id`, t0.`user_id`, t0.`last_seen`, t0.`starred`, t0.`muted`, t0.`thread_id` FROM `thread_participant` AS t0 WHERE (t0.`thread_id` IN (?,?,?,?,?)) ORDER BY t0.`thread_id` [45, 47, 66, 77, 88]

SELECT u0.`id`, u0.`display_name`, u0.`id` FROM `user` AS u0 WHERE (u0.`id` IN (?,?,?,?,?,?)) [10, 11, 12, 13, 14, 15] 

To fix problem 2 (how can I output it, normalized, in my view (only the participant ID's are kept under thread.participants and all distinct users are outputted under users)) well it's pretty simple once you start understanding elixir's maps, list and enums :

The controller pass the threads list to the view which has the following code:

def render("index.json", %{thread: threads}) do
%{ data:
  %{
    threads: render_many(threads, Messenger.ThreadView, "user_thread.json"),
    users: render_many(threads |> Stream.flat_map(&(&1.users)) |> Stream.uniq, Messenger.UserAbstractView, "user_abstract.json")
  }
}

def render("user_thread.json", %{thread: thread}) do
  %{id: thread.id,
    last_seen: thread.last_seen,
    muted: thread.muted,
    starred: thread.starred,
    unread_count: thread.unread_count,
    participants: Enum.map(thread.participants, fn(tp) -> tp.user_id end)
    }
end

The tricky parts:

#Here we extract a list of uniq users from our list of threads  
#and use our user view to render them
users: render_many(threads |> Stream.flat_map(&(&1.users)) |> Stream.uniq, Messenger.UserAbstractView, "user_abstract.json")

#Here we populate the participants key with a list of the participants ids
participants: Enum.map(thread.participants, fn(tp) -> tp.user_id end)

And there you go! -> normalized structure.

Hope it will save you some time, if, as me, your dipping your toe in this wonderful language that Elixir is.