2
votes

I am starting my journey with Elixir and am looking for some advice on how best to approach a particular problem.

I have a data set that needs to be searched as quickly as possible. The data consists of two numbers that form an enclosed band and some meta data associated with each band.

For example:

From,To,Data    
10000,10999,MetaData1
11000,11999,MetaData2
12000,12499,MetaData3
12500,12999,MetaData4

This data set could have upwards of 100,000 entries.

I have a struct defined that models the data, along with a parser that creates an Elixir list in-memory representation.

defmodule Band do
    defstruct from: 0, to: 0, metadata: 0
end

The parser returns a list of the Band struct. I defined a find method that uses a list comprehension

defp find_metadata(bands, number) do
        match? = fn(x) -> x.from <= number and x.to >= number end

        [match | _ ] = for band <- bands, match?.(band), do: band

        { :find, band }
end

Based on my newbie knowledge, using the list comprehension will require a full traversal of the list. To avoid scanning the full list, I have used search trees in other languages.

Is there an algorithm/mechanism/approach available in Elixir that would a more efficient approach for this type of search problem?

Thank you.

1
Enum.find/2 is likely what you want (it will traverse a list until you find the first that matches a given predicate/function).José Valim
thanks @JoséValim, I was concerned about the time for a sequential search. On a 20K search space, running 5,000 searches with the match towards the end of the space, list comprehension takes ~1400ms while Enum.Find comes in at ~1800ms. I've used mix test --trace to get the timings - not sure if that's accurate enough?BrianB
To some extent (as @obrok suggests below) this isn't an Elixir issue so much as it's a data structure issue. Pick the right data structure and your performance should be excellent.Onorio Catenacci
@OnorioCatenacci agreed - wanted to ensure that there wasn't a language feature that would help with this type of data structure.BrianB

1 Answers

4
votes

If the bands are mutually exclusive you could structure them into a tree sorted by from. Searching through that tree should take log(n) time. Something like the following should work:

defmodule Tree do
  defstruct left: nil, right: nil, key: nil, value: nil

  def empty do
    nil
  end

  def insert(tree, value = {key, _}) do
    cond do
      tree == nil    -> %Tree{left: empty, right: empty, key: key, value: value}
      key < tree.key -> %{tree | left: insert(tree.left, value)}
      true           -> %{tree | right: insert(tree.right, value)}
    end
  end

  def find_interval(tree, value) do
    cond do
      tree == nil                -> nil
      value < tree.key           -> find_interval(tree.left, value)
      between(tree.value, value) -> tree.value
      true                       -> find_interval(tree.right, value)
    end
  end

  def between({left, right}, value) do
    value >= left and value <= right
  end
end

Note that you can also use Ranges to store the "bands" as you call them. Also note that the tree isn't balanced. A simple scheme to (probably) achieve a balanced tree is to shuffle the intervals before inserting them. Otherwise you'd need to have a more complex implementation that balances the tree. You can look at erlang's gb_trees for inspiration.