Why so bad and what should I do except using C++.
Make sure you're using -O2. In my tests it reduced the maximum memory
residence from 1095M to 544M. Runtime also reduces down to 2.65s from 3.48s.
Try to use a more efficient library, as also mentioned in the comments. Try
hmatrix, for example.
Now, as for the "why". Here's how Matrix type is implemented in matrix
library:
data Matrix a = M {
nrows :: {-# UNPACK #-} !Int -- ^ Number of rows.
, ncols :: {-# UNPACK #-} !Int -- ^ Number of columns.
, rowOffset :: {-# UNPACK #-} !Int
, colOffset :: {-# UNPACK #-} !Int
, vcols :: {-# UNPACK #-} !Int -- ^ Number of columns of the matrix without offset
, mvect :: V.Vector a -- ^ Content of the matrix as a plain vector.
}
(Assuming we're on a 64bit system)
We have 5 unpacked integers, which makes 5 x 8 bytes = 40 bytes. For the
actual vector data we have a Vector. It's not an unboxed
vector,
which means every element of the vector will be a pointer to a heap allocated
object. In our case, we have these functions:
m :: Int -> Matrix Double
m n =
matrix (2^n) (2^n) ( \(i, j) -> (fromIntegral i) * (fromIntegral j) )
p :: Int -> Double
p n = let r = m n in multStd2 r r ! (n,n)
(types are added to make it clear)
So we have a Matrix Double of size 2^12 * 2^12 = 16777216 doubles.
Now to estimate how much memory it'll take to store that many doubles in the
heap, let's look at GHC's heap object representation.
A heap object is represented as this:
typedef struct StgClosure_ {
StgHeader header;
struct StgClosure_ *payload[FLEXIBLE_ARRAY];
} *StgClosurePtr;
typedef struct {
const StgInfoTable* info;
} StgHeader;
typedef struct StgInfoTable_ {
StgClosureInfo layout; /* closure layout info (one word) */
StgHalfWord type; /* closure type */
StgHalfWord srt_bitmap;
StgCode code[FLEXIBLE_ARRAY];
} *StgInfoTablePtr;
typedef union {
struct { /* Heap closure payload layout: */
StgHalfWord ptrs; /* number of pointers */
StgHalfWord nptrs; /* number of non-pointers */
} payload;
StgWord bitmap; /* word-sized bit pattern describing */
/* a stack frame: see below */
OFFSET_FIELD(large_bitmap_offset); /* offset from info table to large bitmap structure */
StgWord selector_offset; /* used in THUNK_SELECTORs */
} StgClosureInfo;
We have quite a lot of bookkeeping here. To make things simpler, let's assume
the matrix library is efficient enough to avoid thunking here(i.e. matrix
function is strict in it's return value). In this case we can say that our
payload will only have the actual double. So here's the math:
16777216 doubles.
- 16777216 struct StgClosure_ * for Double type = 16777216 * 8 bytes.
- Our payload will only have doubles = 16777216 * 8 bytes.
- 16777216 StgHeader for Double type.
- 16777216
StgInfoTable* = 16777216 * 8 bytes.
- A
StgInfoTable consist of:
- 1
StgClosureInfo = at least 3 words = 24 bytes.
- 2 half words = 8 bytes.
- An array of bytes. In our case this will be 0 bytes I think. So just a
pointer = 8 bytes.
When we add up these it makes 1073M.
Now, the match was pretty hand-wavy but hopefully it's clear that for every heap
object there's a lot of bookkeeping going on. To be more precise we'd need to
take things like shared info tables(e.g. as far as I know same closures would
have same info tables?) into account, I think.
multStd2andMatrix? No clue. - Zetahmatrixpackage can optionally link with the C library OpenBLAS. Compiling hmatrix enabling such optional flag may be beneficial (or detrimental -- but still worth trying if performance is a concern). - chi