2
votes

I'm currently writing a C++ program that needs to write a large amount of data in a file (typically around 5Gb), by writing buffers sequentially. In this example, I'm writing buffers of 400Mb into a std::ofstream. My disk is a SSD and almost empty. After 4 buffers, the writing performance decreases. Does someone know why and if that can be avoided?

Here is my code:

#include <iostream>
#include <iomanip>
#include <fstream>
#include <chrono>

int main()
{
  unsigned int bufferSize = 4e8; //400Mb
  unsigned char* buffer = new unsigned char[bufferSize];
  for (unsigned int i = 0; i < bufferSize; i++)
    buffer[i] = (unsigned char) i % 256; // just "randomly" writing the buffer
  std::ofstream ofs("test.bin");
  std::chrono::steady_clock::time_point start_time;
  std::chrono::steady_clock::time_point stop_time;
  std::chrono::duration<double> duration;

  for (int i = 1; i <= 10; i++)
  {
    start_time = std::chrono::steady_clock::now();
    ofs.write((char*) buffer, bufferSize);
    stop_time = std::chrono::steady_clock::now();
    duration = stop_time - start_time;
    std::cout << "i = " << i << ", time spent copying the buffer = " << std::chrono::duration_cast<std::chrono::nanoseconds>(duration).count() / 1e6  << "ms" << std::endl;
  }

  ofs.close();
  delete[] buffer;
  return 0;
}

And here is what I get when running it:

nirupamix@machine:~/$ ./main.out
i = 1, time spent copying the buffer = 166.267ms
i = 2, time spent copying the buffer = 170.698ms
i = 3, time spent copying the buffer = 177.484ms
i = 4, time spent copying the buffer = 210.693ms
i = 5, time spent copying the buffer = 475.933ms
i = 6, time spent copying the buffer = 793.295ms
i = 7, time spent copying the buffer = 822.195ms
i = 8, time spent copying the buffer = 828.539ms
i = 9, time spent copying the buffer = 850.651ms
i = 10, time spent copying the buffer = 794.542ms

Thanks for your time!

1
What is the expected speed of your ssd? Usual SATA SSD is limited to 600 MB/S. - user7860670
Install more RAM so that your OS has more cache to play with. - n. 1.8e9-where's-my-share m.
Unsure whether it is related (hence a comment and not an answer) but I can remember that old file systems used indirect addressing blocks after reaching some size. Unsure for newer ones, and specially on a SSD hardware... - Serge Ballesta
Unrelated, but you should use std::ios::binary open-mode if you're writing binary data. As others pointed out, this looks like a caching thing probably happening at the OS driver level and you're actually limited by your disk speed and cache size. If you pause for a second between each write, I imagine that you won't see such a slowdown. If you need higher persistent data rates, you'll need to build a RAID volume with several drives. - paddy
Note: MB vs. Mb, case matters, one of the units is eight times larger than the other (bytes vs. bits). ; Anyway, this may depend on the operating system (what is yours?), e.g. Linux will acknowledge the initial writes quickly and once the amount of RAM to be written back to disk grows above a certain proportion, it will slow the writes to approximately the speed of the underlying disk, which you may be seeing here. You can try putting an fsync after the write, that should let you see the actual speed (or slowness) of your disk. - dratenik

1 Answers

2
votes

There are several problems that can occur when writing large amount of data to a SSD

  • Termal throttling -when the data write times are long enough the SDD heats up and slows down
    • your data should not be large enough for that.
  • SSD cache runs full
    • depending on the age and technology of your SSD your SSD might have some DRAM to cache writes before going to slower SLC cache and finally if that is full to the actual QLC storage. (see more at anandtech.