0
votes

I have 2 lists of maps:

list1 = 
[
 %{amount: 1, id: 112006},
 %{amount: 1, id: 78798},
 %{amount: 6, id: 92572},
 %{amount: 1, id: 89750},
 %{amount: 1, id: 81418},
 %{amount: 3, id: 92062},
 %{amount: 1, id: 82373},
 %{amount: 1, id: 92856}...
]

and

list2 =
[
 %{count: 5, id: [112006, 92062, 92856, 67812, 70736], name: "Object 1"},
 %{count: 655, id: [92572, 22432, 32368, 34180, 34181, 34182, ...],    name: "Object 2"},
 %{count: 238, id: [26052, 30430, 37067, 37068, 41228, 42686, ...], name: "Object 3"}
 ...
]

list1 is with 30000+ maps in it and list2 with about a 100 maps, the id's are the same in both list, I want to concat the two list into one:

[
 %{count: 5, all_count: 5 name: "Object 1"},
 %{count: 655, all_count: 3, name: "Object 2"},
 ....
]

With the new all_count-key that is the sum of all amount from list1 with the same id's that is in the id-array in list2.

I did:

Enum.map(list2, fn(map) ->
    all_count =
      list1
      |> Enum.filter(&Enum.member?(map.id, &1.id))
      |> Enum.map(&(&1.amount))
      |> Enum.sum
     Map.put(map, :all_count, all_count)
   end)

witch works but is very slow and I need something faster, tried with Flow:

Enum.map(list2, fn(map) ->
    all_count =
      list1
      |> Flow.from_enumerable()
      |> Flow.filter(&Enum.member?(map.id, &1.id))
      |> Flow.map(&(&1.amount))
      |> Enum.sum
     Map.put(map, :all_count, all_count)
   end)

got it a bit quicker but not much, any tips how to get it faster? Tia.

2

2 Answers

0
votes

The main key to your problem is that Filter is an O(n) operation and thus on each iteration, you are looping over all the 30k elements of the list.

This makes your entire operation O(n^2) complexity and is thus not scaleable.

It is possible to reduce your problem to a O(n) complexity by converting the first list into a hash_table as follows:

list1 = [
 %{amount: 1, id: 112006},
 %{amount: 1, id: 78798},
 %{amount: 6, id: 92572},
 %{amount: 1, id: 89750},
 %{amount: 1, id: 81418},
 %{amount: 3, id: 92062},
 %{amount: 1, id: 82373},
 %{amount: 1, id: 92856}
]

hash_table = Enum.reduce(list1, %{}, fn e, a -> Map.merge(a, %{e.id => e}) end)

A better alternative to Map.merge, as suggested in the comments would be:

hash_table = Enum.reduce(list1, %{}, &Map.put(&2, &1.id, &1))

So you would be left with the following:

%{
  78798 => %{amount: 1, id: 78798},
  81418 => %{amount: 1, id: 81418},
  82373 => %{amount: 1, id: 82373},
  89750 => %{amount: 1, id: 89750},
  92062 => %{amount: 3, id: 92062},
  92572 => %{amount: 6, id: 92572},
  92856 => %{amount: 1, id: 92856},
  112006 => %{amount: 1, id: 112006}
}

Now instead of looping over each element, you can simply lookup the element you need with O(1) access with O(log n) access, e.g. list1[82373] will give you %{amount: 1, id: 82373}, from which you can get your amount. If you do not foresee the need for further keys in any of these data points besides amount, you could facilitate things further by having the id point to the amount value directly.

Once you have your proof of concept, you can modify your program to fully adopt a hash_map data structure so you can avoid having to constantly convert list1 to the hash_map structure. Perhaps you can also consider putting it all in an ETS table down the road, which could give you O(1) lookup access, as stated in the docs:

These provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data.

0
votes

What you could try, instead of filtering for the IDs in each map function in list1 is to turn that into a map, where the keys are the id, and the values are the amount parts:

map1 = list1 |> List.foldl(%{}, fn(m, acc) -> Map.put(acc, m.id, m.amount) end)

# Result
%{
  78798 => 1,
  81418 => 1,
  82373 => 1,
  89750 => 1,
  92062 => 3,
  92572 => 6,
  92856 => 1,
  112006 => 1
  ...
}

Then you could slightly adjust your code. Or alternatively you can try it with List.foldl/3 to build up your result list:

List.foldl(list2, [], fn(map, acc) ->
  map_count = Map.take(map, [:count, :name])
  count = map.id |> List.foldl(0, &(map1[&1] + &2))
  acc ++ [Map.put(map_count, :all_count, count)]
end)