1
votes

I have an algorithm that requires the construction of an NxN matrix inside a function that will return the product of this matrix with an Nx1 vector that's also built on the fly. (N is usually 8 or 9, but must be generalized for values greater than that).

I'm using the Eigen library for performing algebraic operations that are even more complex (least squares and several other constrained problems), so switching it isn't an option.

I've benchmarked the functions, and there's a huge bottleneck due to the intensive memory allocations. I aim to build a thread safe application, so, for some cases, I replaced these matrices and vectors with references to elements from a global vector that serves as a provider for objects that cannot be stored on the stack. This avoids calling the constructors/destructors of the Eigen matrices and vectors, but it's not an elegant solution and it can lead to huge problems if considerable care is not taken.

As such, does Eigen either offer a workaround because I don't see the option of passing an allocator as a template argument for these objects, OR is there a more obvious thing to do?

4
Does N change between the calls? If yes, then how did you prepare this "global object" when you do not know what N will be?luk32
N does not change between calls. It is dependent on the input data model the object being processed is created from.teodron
Then maybe you can make your "resource manager" a static within your function. Otherwise the global one is hard to avoid IMO. It should not be very hard I think. You should get away with something simple, i.e. with out garbage collection, or any clean ups.luk32
Due to the multithreaded constraints, it should be better to have an external/global provider for these matrices, on a thread-wise basis.teodron
You can have locks within the resource manager. It shouldn't matter if it's static or not. Of course if your app spams allocations it will get serialized on those calls. But it do not think it's a real problem in a normal use-case. Also I don't think eigen exposes it's memory allocations. They do some magic to get memory alignment in order to use SSE. So best bet is to wrap the eigen objects themselves in some kind of resource manager like you did.luk32

4 Answers

4
votes

You can manage your own memory in a way that fits your needs and use Eigen::Map instead of Eigen::Matrix to perform calculations with it. Just make sure the data is aligned properly or notify Eigen if it isn't.

See the reference Eigen::Map for details.

Here is short example:

#include <iostream>
#include <Eigen/Core>


int main() {
    int mydata[3 * 4]; // Manage your own memory as you see fit
    int* data_ptr = mydata;

    Eigen::Map<Eigen::MatrixXi, Eigen::Unaligned> mymatrix(data_ptr, 3, 4);

    // use mymatrix like you would any another matrix
    mymatrix = Eigen::MatrixXi::Zero(3, 4);
    std::cout << mymatrix << '\n';

    // This line will trigger a failed assertion in debug mode
    // To change it see
    // http://eigen.tuxfamily.org/dox-devel/TopicAssertions.html
    mymatrix = Eigen::MatrixXi::Ones(3, 6);


    std::cout << mymatrix << '\n';
}
2
votes

To gather my comments into a full idea. Here is how I would try to do it.

Because the memory allocation in eigen is a pretty advanced stuff IMO and they do not expose much places to tap into it. The best bet is to wrap eigen objects itself into some kind of resource manager, like OP did.

I would make it a simple bin, that hold Matrix< Scalar, Dynamic, Dynamic> objects. This way you template the Scalar type and have a manager for generalized size matrices.

Whenever you call for an object, you check if you have a free object of the desired size, you return reference to it. If not, you allocate a new one. Simple. when you want to release the object, then you mark it free in the resource manager. I don't think anything more complicated is needed, but of course it's possible to implement some more sophisticated logic.

To ensure thread safety I would put a lock in the manager. Initialize it in the constructor if needed. Of course locking on free and allocate would be needed.

However depending on the work schedule. If the threads work on their own arrays I would consider to make one resource manager instance for each thread, so they don't clock each other. The thing is, that a global lock or a global manager would possibly get exhausted if you have like 12 cores working heavy on allocations/deallocations, and effectively serialize your app thourgh this one lock.

1
votes

You can try replacing your default memory allocator with jemalloc or tcmalloc. It's pretty easy to try out thanks to the LD_PRELOAD mechanism.

I think it works for most C++ projects as well.

0
votes

You could allocate memory for some common matrix sizes before calling that function with operator new or operator new[], store the void* pointers somewhere and let the function itself retrieve an memory block with the right size. After that, you can use placement new for matrix construction. Details are given in More effective C++, item 8.