0
votes

I've been running a database application that writes data synchronously to disk, and so, looking for the best disk throughput. I've found that GCP's local SSDs are supposed to provide the best performance (both for IOPS and MB/s). However, I've tried using them and found that when performing a benchmark of synchronous database writes, the throughput achieved by a persistent zonal SSD is significantly better than that of the local SSD. Strangely the use of a single local SSD results in better performance than a RAID configuration with 4 partitions.

To test the performance I ran a benchmark consisting of a single thread creating transactions in a loop and performing a random 4KB write.

The persistent zonal SSD was 128GB, while the local SSD consists of 4 SSDs in RAID 0. An N2D machine with 32 vCPUs was used to eliminate CPU bottleneck. To ensure it wasn't a problem the with OS or filesystem, I've tried various different versions, including the ones recommended by Google. However, the result is always the same regardless.

The results for my experiments on average are:

SSD Latency Throughput
Zonal P SSD (128 GB) ~1.5ms ~700 writes/second
Local SSD (4 SSDs NVME RAID 0) ~14ms ~71 writes/second
Local SSD (1 SSD) ~13ms ~75 writes/second

I'm at a bit of a loss on how to proceed, as I'm not sure if this result should be expected. If so, it seems like my best option is to use zonal persistent disks. Do you think that these results seem correct, or might there be some problem with my setup?

Suggestions of turning of write-caching etc. will improve performance, however, the goal here is to obtain fast performance for synchronous disk writes. Otherwise, my best option would be zonal persistent SSDs (they offer replicated storage) or just using RAM which will always be faster than any SSD.

As AdolfoOG suggested, there might be an issue with my RAID configuration so to shed some light on this, I use the following commands to create my RAID 0 setup with four devices. Note, /dev/nvme0nX refers to each NVMe device I'm using.

sudo mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4 
sudo mkfs.ext4 -F /dev/md0 
sudo mkdir /mnt/disks/ 
sudo mkdir /mnt/disks/stable-store 
sudo mount /dev/md0 /mnt/disks/stable-store 
sudo chmod a+w /mnt/disks/stable-store 

This should be the same process as what Google advises unless I messed something up of course!

1
How is the storage configured regarding write caching? stackoverflow.com/q/27087912/8016720John Hanley
I suppose a write-through pattern is used. (Each transaction is created, writes 4KB, then commits with an fsync.)Michael Davis
There is something wrong with your local SSD configuration. SSDs have very small latencies around 1ms. Latencies of 21ms are more in line with spinning hard disks.John Hanley
Do you know what this might be? I've set everything up according to Google's recommendations. NVMe, Ubuntu image optimised for NVMe, RAID 0 setup. Similar issues are reported here: medium.com/@rimantasragainis/…. The author there found a optimised OS image that vast improved performance but it was an experimental image given by Google staff so I can't rely on that as a solution. I've run the same benchmark on my own local computer and get the results you suggest would be correct, but with GCP local NVMe the performance is much worse.Michael Davis

1 Answers

2
votes

Answer completely edited after original question edited:

I tried to replicate your situation, I used a more "stock" approach, I didn't code anything to test the MB/s, instead I just used "dd" and "hdparm", I also used a N2-standard-32 instance type with a 100 GB Persistent SSD as boot disk and a RAID 0 of 4 NVME Local SSDs. below my results:

Write tests:

root@instance-1:~# dd if=/dev/zero of=./test oflag=direct bs=1M count=16k 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 18.2175 s, 943 MB/s

root@instance-1:~# dd if=/dev/zero of=./test oflag=direct bs=1M count=32k 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB, 32 GiB) copied, 42.1738 s, 815 MB/s

root@instance-1:~# dd if=/dev/zero of=./test oflag=direct bs=1M count=64k 65536+0 records in 65536+0 records out 68719476736 bytes (69 GB, 64 GiB) copied, 83.6243 s, 822 MB/s

Local SSD:

root@instance-1:~# dd if=/dev/zero of=/mnt/disks/raid/test oflag=direct bs=1M count=16k 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB, 16 GiB) copied, 10.6567 s, 1.6 GB/s

root@instance-1:~# dd if=/dev/zero of=/mnt/disks/raid/test oflag=direct bs=1M count=32k 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB, 32 GiB) copied, 21.26 s, 1.6 GB/s

root@instance-1:~# dd if=/dev/zero of=/mnt/disks/raid/test oflag=direct bs=1M count=64k 65536+0 records in 65536+0 records out 68719476736 bytes (69 GB, 64 GiB) copied, 42.4611 s, 1.6 GB/s

Read tests:

Persisten SSD:

root@instance-1:~# hdparm -tv /dev/sda

/dev/sda: multcount = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 13054/255/63, sectors = 209715200, start = 0 Timing buffered disk reads: 740 MB in 3.00 seconds = 246.60 MB/sec root@instance-1:~# hdparm -tv /dev/md0

Local SSD

/dev/md0: readonly = 0 (off) readahead = 8192 (on) geometry = 393083904/2/4, sectors = 3144671232, start = unknown Timing buffered disk reads: 6888 MB in 3.00 seconds = 2761.63 MB/sec

So, I'm actually seeing better performance in the local SSD raid and, according to the table of performance, I got the expected result for reads, and writes according to this table:

Throughput (MB/s):  Read: 2,650;  Write: 1,400

So, maybe there is something odd with the way you tested the performance as you mentioned that you write a little script to do it, maybe if you try with a more "stock" approach you'll get the same results as I got.