Memory efficient algorithm for `take n (sort xs)` ("sorted prefix") problem

Question

I want to take n biggest elements from lazy list.

I heard that mergesort implemented in Data.List.sort is lazy and it doesn't produce more elements than necessary. This might be true in terms of comparisons, but certainly isn't the case when it comes to memory usage. The following program illustrates the issue:

{-# LANGUAGE ScopedTypeVariables #-}

module Main where

import qualified Data.Heap as Heap
import qualified Data.List as List

import System.Random.MWC
import qualified Data.Vector.Unboxed as Vec

import System.Environment

limitSortL n xs = take n (List.sort xs)
limitSortH n xs = List.unfoldr Heap.uncons (List.foldl' (\ acc x -> Heap.take n (Heap.insert x acc) ) Heap.empty xs) 

main = do
  st <- create
  rxs :: [Int] <- Vec.toList `fmap` uniformVector st (10^7)

  args <- getArgs
  case args of
    ["LIST"] -> print (limitSortL 20 rxs)
    ["HEAP"] -> print (limitSortH 20 rxs)

  return ()

Runtime:

Data.List:

./lazyTest LIST +RTS -s 
[-9223371438221280004,-9223369283422017686,-9223368296903201811,-9223365203042113783,-9223364809100004863,-9223363058932210878,-9223362160334234021,-9223359019266180408,-9223358851531436915,-9223345045262962114,-9223343191568060219,-9223342956514809662,-9223341125508040302,-9223340661319591967,-9223337771462470186,-9223336010230770808,-9223331570472117335,-9223329558935830150,-9223329536207787831,-9223328937489459283]
   2,059,921,192 bytes allocated in the heap
   2,248,105,704 bytes copied during GC
     552,350,688 bytes maximum residency (5 sample(s))
       3,390,456 bytes maximum slop
            1168 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:  3772 collections,     0 parallel,  1.44s,  1.48s elapsed
  Generation 1:     5 collections,     0 parallel,  0.90s,  1.13s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.82s  (  0.84s elapsed)
  GC    time    2.34s  (  2.61s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    3.16s  (  3.45s elapsed)

  %GC time      74.1%  (75.7% elapsed)

  Alloc rate    2,522,515,156 bytes per MUT second

  Productivity  25.9% of total user, 23.7% of total elapsed

Data.Heap:

./lazyTest HEAP +RTS -s 
[-9223371438221280004,-9223369283422017686,-9223368296903201811,-9223365203042113783,-9223364809100004863,-9223363058932210878,-9223362160334234021,-9223359019266180408,-9223358851531436915,-9223345045262962114,-9223343191568060219,-9223342956514809662,-9223341125508040302,-9223340661319591967,-9223337771462470186,-9223336010230770808,-9223331570472117335,-9223329558935830150,-9223329536207787831,-9223328937489459283]
 177,559,536,928 bytes allocated in the heap
     237,093,320 bytes copied during GC
      80,031,376 bytes maximum residency (2 sample(s))
         745,368 bytes maximum slop
              78 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0: 338539 collections,     0 parallel,  1.24s,  1.31s elapsed
  Generation 1:     2 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   35.24s  ( 35.46s elapsed)
  GC    time    1.24s  (  1.31s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   36.48s  ( 36.77s elapsed)

  %GC time       3.4%  (3.6% elapsed)

  Alloc rate    5,038,907,812 bytes per MUT second

  Productivity  96.6% of total user, 95.8% of total elapsed

Clearly limitSortL is much faster, but it's also very memory hungry. On larger lists it hit's RAM size.

Is there a faster algorithm to solve this problem which isn't that memory hungry?

Edit: Clarification: I use Data.Heap from heaps package, I didn't try the heap package.

The n that you introduced in the first line - i.e. the output list size. — Robin Green
This would be more compelling if you were working with real data. Generating an extremely large vector makes me suspicious; GHC plays tricks on you when it can decipher things about the data you create. — Dan Burton
@Dan I think this is the right approach, since all it can tell is it will be some sort of random list of specific length. I've certainly seen this approach in other benchmarks. — Tener

Tener Tener · Accepted Answer · 2011-05-10T22:04:14

So, I've actually managed to solve the problem. The idea is to throw away fancy data structures and work by hand ;-) Essentially we split input list into chunks, sort them, and foldl the [[Int]] list, selecting n smallest elements at each step. The trickies part is merging accumulator with sorted chunk in proper way. We have to use seq or otherwise the lazyness will bite you and the result still need lot's of memory to compute. Additionally I mix merge with take n, just to optimize things more. Here is the whole program, along with previous attempts:

{-# LANGUAGE ScopedTypeVariables, PackageImports #-}     
module Main where

import qualified Data.List as List
import qualified Data.List.Split as Split
import qualified "heaps" Data.Heap as Heap -- qualified import from "heaps" package

import System.Random.MWC
import qualified Data.Vector.Unboxed as Vec

import System.Environment

limitSortL n xs = take n (List.sort xs)
limitSortH n xs = List.unfoldr Heap.uncons (List.foldl' (\ acc x -> Heap.take n (Heap.insert x acc) ) Heap.empty xs)
takeSortMerge n inp = List.foldl' 
                        (\acc lst -> (merge n acc (List.sort lst))) 
                        [] (Split.splitEvery n inp)
    where
     merge 0 _ _ = []
     merge _ [] xs = xs
     merge _ ys [] = ys
     merge f (x:xs) (y:ys) | x < y = let tail = merge (f-1) xs (y:ys) in tail `seq` (x:tail) 
                           | otherwise = let tail = merge (f-1) (x:xs) ys in tail `seq` (y:tail)


main = do
  st <- create

  let n1 = 10^7
      n2 = 20

  rxs :: [Int] <- Vec.toList `fmap` uniformVector st (n1)

  args <- getArgs

  case args of
    ["LIST"] ->  print (limitSortL n2 rxs)
    ["HEAP"] ->  print (limitSortH n2 rxs)
    ["MERGE"] -> print (takeSortMerge n2 rxs)
    _ -> putStrLn "Nothing..."

  return ()

Runtime performance, memory consumption, GC time:

LIST       3.96s   1168 MB    75 %
HEAP       35.29s    78 MB    3.6 %
MERGE      1.00s     78 MB    3.0 %
just rxs   0.21s     78 MB    0.0 %  -- just evaluating the random vector

Memory efficient algorithm for `take n (sort xs)` ("sorted prefix") problem

6 Answers