How do I split a list by keyword in Elixir

8

votes

Let's say I have a list of words, where a keyword, in this case "stop", demarcates full sentences:

["Hello", "from", "Paris", "stop", "Weather", "is", "sunny", "stop", "Missing", "you", "stop"]

which I want to turn into:

[["Hello", "from", "Paris"], ["Weather", "is", "sunny"], ["Missing", "you"]]

I know I can do this with strings with String.split, but ideally I'd like to learn how to tackle the above problem with fundamental functional constructs, such as recursion on [head|tail] etc, but I cannot figure out where to start on how to accumulate intermediate lists.

elixir

7

votes

You can use chunk_by/2:

["Hello", "from", "Paris", "stop", "Weather", "is", "sunny", "stop", "Missing", "you", "stop"]    
|> Enum.chunk_by(fn(x) -> x != "stop" end) 
|> Enum.reject(fn(x) -> x == ["stop"] end)

Performance

Out of curiosity, I wanted to benchmark the performance of the implementations given to this question. The benchmark was for 100,000 calls of each implementation and I ran it 3 times. Here are the results if someone is interested:

0.292903s | 0.316024s | 0.292106s | chunk_by

0.168113s | 0.152456s | 0.151854s | Main.main (@Dogbert's answer)

0.167387s | 0.148059s | 0.143763s | chunk_on (@Martin Svalin's answer)

0.177080s | 0.180632s | 0.185636s | splitter (@stephen_m's answer)

3

votes

Here's a simple tail-recursive implementation using pattern matching:

defmodule Main do
  def split_on(list, on) do
    list
    |> Enum.reverse
    |> do_split_on(on, [[]])
    |> Enum.reject(fn list -> list == [] end)
  end

  def do_split_on([], _, acc), do: acc
  def do_split_on([h | t], h, acc), do: do_split_on(t, h, [[] | acc])
  def do_split_on([h | t], on, [h2 | t2]), do: do_split_on(t, on, [[h | h2] | t2])

  def main do
    ["Hello", "from", "Paris", "stop", "Weather", "is", "sunny", "stop", "Missing", "you", "stop"]
    |> split_on("stop")
    |> IO.inspect
  end
end

Main.main

Output:

[["Hello", "from", "Paris"], ["Weather", "is", "sunny"], ["Missing", "you"]]

3

votes

This is almost what Enum.chunk_by/2 does.

def chunk_by(enumerable, fun)

Splits enumerable on every element for which fun returns a new value.

But chunk_by won't throw away any elements, so we can combine it with Enum.filter/2.

list = [1, 2, 3, :stop, 4, 5, 6, :stop, 7, 8, :stop] # analogous to your list

list
|> Enum.chunk_by(&(&1 == :stop))
   # at this point, you have [[1,2,3], [:stop], [4,5,6], [:stop], [7,8], [:stop]]
|> Enum.reject(&(&1 == [:stop]))
   # here you are: [[1,2,3], [4,5,6], [7,8]]

A second approach would be to use Enum.reduce/3. Since we build up the accumulator at the front, pushing the first elements we find towards the back, it makes sense to reverse the list before we reduce it. Otherwise we'll end up with a reversed list of reversed lists.

We'll potentially get empty lists, like the final :stop in our example list. So again, we filter the list at the end.

list
|> Enum.reverse
|> Enum.reduce([[]], fn         # note: the accumulator is a nested empty list
  :stop, acc -> [[] | acc]      # element is the stop word, start a new list
  el, [h | t] -> [[el | h] | t] # remember, h is a list, t is list of lists
end)
|> Enum.reject(&Enum.empty?/1)

Finally, let's walk the list ourselves, and build an accumulator. If this reminds you of the reduce version, it's no coincidence.

defmodule Stopword do
  def chunk_on(list, stop \\ :stop) do
    list
    |> Enum.reverse
    |> chunk_on(stop, [[]])
  end

  defp chunk_on([], _, acc) do
    Enum.reject(acc, &Enum.empty?/1)
  end
  defp chunk_on([stop | t], stop, acc) do
    chunk_on(t, stop, [[] | acc])
  end
  defp chunk_on([el | t], stop, [head_list | tail_lists]) do
    chunk_on(t, stop, [[el | head_list] | tail_lists])
  end
end

We use the common pattern of a public function that doesn't require users to worry about the accumulator, and passing on the inputs to a private arity+1 function with an accumulator. Since we're building up a list of lists, it's useful to start off the accumulator with an empty list inside it. This way, we don't have to special case when the accumulator is empty.

We reverse the list before we walk it, as we did for reduce, just as we reject empty lists after we're done. The same reasons apply.

We use pattern matching to identify the stop word. The stop word marks the beginning of a new list, so we add a new, empty list and throw away the stop word.

A regular word is simply put at the front of the first list, in our list of lists. The syntax is a bit unwieldy with all those bars and brackets.

2

votes

Personally, I like AbM's answer best and I prefer it over and above this answer due to the easy readability.

That said, I decided out of interest to see if it could be done without the final Enum.reject function.

  def splitter(list) do
    res =
    List.foldl(list, [], fn(word, acc)->
      case {word, acc} do
        {"stop", []} ->
          []
        {word, []} ->
          [[word]]
        {"stop", [[], acc]} ->
            [h | t] = acc
            [Enum.reverse(h) | t]
        {"stop", acc} ->
          [h | t] = acc
          [[] | [Enum.reverse(h) | t]]
        {word, [[] | acc]} ->
          [[word] | acc]
        {word, acc} ->
          [h | t] = acc
          new_h = [word | h]
          if t == [], do: [new_h], else: [new_h | t]
     end
    end)
    res = if List.first(res) == [], do: ([h | t] = res; t), else: (res)
    Enum.reverse(res)
  end

splitter(["Hello", "from", "Paris", "stop", "Weather", "is", "sunny", "stop", "Missing", "you", "stop"])

# [["Hello", "from", "Paris"], ["Weather", "is", "sunny"], ["Missing", "you"]]

Looking at the code is a bit of a headache and I probably wouldn't use it for that reason but I think it runs faster.

How do I split a list by keyword in Elixir

4 Answers

Performance