I have a program in which I want to be able to store certain data (dynamically allocated blocks), on disk for reduced memory usage and persistence.
My first thought was to write my own custom allocator which managed the content of files on the disk, but I want to see what alternatives there are too.
I've looked into custom memory allocators and topics on object serialization but there are subtle differences, both good and bad, when adapting those principles to managing the address space of a file.
In this situation:
Memory is accessed only through IO (read/write) functions rather than directly
No objects(methods/pointers) are stored, only data.
The size of a file is not static, so it should grow when needed rather than being large and static
For my uses, it is acceptable to re-map existing pointers after defragmentation
Because the data is not of a fixed size, most database implementations seem not well suited.
I ask, what is the best approach for this problem? Should I implement a simple memory allocator which treats a file as the heap?
For reference, im using C++ on embedded devices.
Edit: I've implemented my own memory manager which uses buddy memory allocation and block sizes of powers of two. I'm satisfied that it is correct and does not leak, coalesces free blocks, and can do a 'stop the world' defragmentation.
The problem is, as expected, there is quite a bit of internal and external fragmentation. I'm not an expert in this field and although I find it fascinating (I'm still a student), I wonder if there are any other implementations that have done the same or similar thing? Surely I cant be the only one?
Some helpful but so far incompatible topics are:
mmap tbh I havent used mmap but, it addresses the file IO, but not the management of the file address space.
BOOST:serialization I have a (probably unjustified) reluctance to use boost libraries at the moment.
STXXL Interesting but doesnt address variable size memory allocation
Doug Lea Memory Allocator Has very good insights into issues with memory allocators, but I'm not in a position to try and make my own implementation