Writing To A File With Multiple Streams C#

Question

I am trying to download a large file (>1GB) from one server to another using HTTP. To do this I am making HTTP range requests in parallel. This lets me download the file in parallel.

When saving to disk I am taking each response stream, opening the same file as a file stream, seeking to the range I want and then writing.

However I find that all but one of my response streams times out. It looks like the disk I/O cannot keep up with the network I/O. However, if I do the same thing but have each thread write to a separate file it works fine.

For reference, here is my code writing to the same file:

int numberOfStreams = 4;
List<Tuple<int, int>> ranges = new List<Tuple<int, int>>();
string fileName = @"C:\MyCoolFile.txt";
//List populated here
Parallel.For(0, numberOfStreams, (index, state) =>
{
    try
    {
        HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("Some URL");
        using(Stream responseStream = webRequest.GetResponse().GetResponseStream())
        {
            using (FileStream fileStream = File.Open(fileName, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write))
            {
                fileStream.Seek(ranges[index].Item1, SeekOrigin.Begin);
                byte[] buffer = new byte[64 * 1024];
                int bytesRead;
                while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    if (state.IsStopped)
                    {
                        return;
                    }
                    fileStream.Write(buffer, 0, bytesRead);
                }
            }
        };
    }
    catch (Exception e)
    {
        exception = e;
        state.Stop();
    }
});

And here is the code writing to multiple files:

int numberOfStreams = 4;
List<Tuple<int, int>> ranges = new List<Tuple<int, int>>();
string fileName = @"C:\MyCoolFile.txt";
//List populated here
Parallel.For(0, numberOfStreams, (index, state) =>
{
    try
    {
        HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("Some URL");
        using(Stream responseStream = webRequest.GetResponse().GetResponseStream())
        {
            using (FileStream fileStream = File.Open(fileName + "." + index + ".tmp", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write))
            {
                fileStream.Seek(ranges[index].Item1, SeekOrigin.Begin);
                byte[] buffer = new byte[64 * 1024];
                int bytesRead;
                while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    if (state.IsStopped)
                    {
                        return;
                    }
                    fileStream.Write(buffer, 0, bytesRead);
                }
            }
        };
    }
    catch (Exception e)
    {
        exception = e;
        state.Stop();
    }
});

My question is this, is there some additional checks/actions that C#/Windows takes when writing to a single file from multiple threads that would cause the file I/O to be slower than when writing to multiple files? All disk operations should be bound by the disk speed right? Can anyone explain this behavior?

Thanks in advance!

UPDATE: Here is the error the source server is throwing:

"Unable to write data to the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." [System.IO.IOException]: "Unable to write data to the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." InnerException: "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond" Message: "Unable to write data to the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." StackTrace: " at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)\r\n at System.Net.Security._SslStream.StartWriting(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)\r\n at System.Net.Security._SslStream.ProcessWrite(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)\r\n at System.Net.Security.SslStream.Write(Byte[] buffer, Int32 offset, Int32 count)\r\n

the only thing that I can see that might cause the writing to a single file to become and or appear sluggish is that you are not flushing the file after each call to fileStream.Write(buffer, 0, bytesRead); — MethodMan
Don't open the same file in each thread. Open it once and use that single instance(make sure more than one thread doesn't write at the same time - you can use lock for it) — EZI
This should work. Post the exception ToString. How fast is the network and how big your timeout? (Note, that Parallel.For is unsuitable because it uses an uncontrollable degree of parallelism. You can only specify a maximum.) — usr
@usr the server I am getting the file from is what throws the exception. It states 'SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'. I'm taking this to mean that the receiver is not reading bytes from the stream fast enough. — shortspider
@shortspider It generally means you try to connect to a non-existing machine. (If it existed, you would get, either a successful connection or a kind of connection rejected message in a very short time) — EZI

Douglas Douglas · Accepted Answer · 2015-07-31T17:50:05

Unless you're writing to a striped RAID, you're unlikely to experience performance benefits by writing to the file from multiple threads concurrently. In fact, it's more likely to be the opposite – the concurrent writes would get interleaved and cause random access, incurring disk seek latencies that makes them orders of magnitude slower than large sequential writes.

To get a sense of perspective, look at some latency comparisons. A sequential 1 MB read from disk takes 20 ms; writes take approximately the same time. Each disk seek, on the other hands, takes around 10 ms. If your writes are interleaved at 4 KB chunks, then your 1 MB write will require an additional 2560 ms of seek time, making it 100 times slower than sequential.

I would suggest only allowing one thread to write to the file at any time, and use parallelism just for the network transfer. You can use a producer–consumer pattern where downloaded chunks are written to a bounded concurrent collection (such as BlockingCollection<T>), which then get picked up and written to disk by a dedicated thread.

Writing To A File With Multiple Streams C#

5 Answers