1
votes

Does anyone know what is maximum size to upload file via Azure HDFS Rest API? (https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-data-operations-rest-api).

I found someplace 256MB, some place 32MB, so wondering.

Or similar limits for other SDKs?

2

2 Answers

2
votes

I was wrestling with the same problem some months ago and it turned out that the IIS which is in front of ADLS is setting the maxAllowedContentLength with default value of 30000000 bytes (or 28.6Mb). This essentially means that whenever we want to push anything bigger that 30Mb, that request never reaches ADL as IIS throws 404.13 before that. Reference.

As already suggested in the links, ADLS has a driver with a 4-MB buffer, I'm using the .NET SDK myself and following code has served me well

public async Task AddFile(byte[] content, string path)
{
        const int fourMb = 4 * 1024 * 1024;
        var buffer = new byte[fourMb];
        using (var stream = new MemoryStream(content))
        {
            if (!_adlsFileSystemClient.FileSystem.PathExists(_account, path))
            {
                _adlsFileSystemClient.FileSystem.Create(_account, path);
            }

            int bytesToRead;
            while ((bytesToRead = stream.Read(buffer, 0, buffer.Length)) > 0)
            {
                if (bytesToRead < fourMb)
                {
                    Array.Resize(ref buffer, bytesToRead);
                }
                using (var s = new MemoryStream(buffer))
                {
                    await _adlsFileSystemClient.FileSystem.AppendAsync(_account, path, s);
                }
                //skipped for brevity
1
votes

In my tests, I am finding a maximum file size limit somewhere between 28MB and 30MB.

Using the Azure Data Lake Storage REST API, I have had no issues creating files as large as 28MB. However, when I try to create a file that is 30MB, I receive a 404 Not Found error.

The following references align with the file size limit and 404 error I am observing. The references are about the SDK, but it could be that the SDK is also calling the REST API under the covers. My tests are calling the REST API directly.