1
votes

To upload a file to ADL first you need to:

  • do a put request with the ?resource=file parameters (this creates a file on the ADL)
  • append data to the file with the ?action=append&position=<N> parameters
  • lastly, you need to flush the data with ?action=flush&position=<FILE_SIZE>

My question is:

Is there a way to tell the server how long the data should live if it is not flushed(written).

Since you need to create a file first to write data into it, there might be scenarios where the flush does not happen, and you are stuck with an empty file in the data lake.

I could not find anything on the Microsoft documentation about this.

Any info would be appreciated.

1

1 Answers

0
votes

Updated 0219:

If you just call the append api, but not call the flush api, then the uncommitted data will be saved in azure within 7 days.

The uncommitted data will be deleted automatically after 7 days and cannot be deleted from the your end.


Origianl:

The SDK for Azure Datalake Storage Gen2 is ready, and you can use it to operate ADLS Gen2 more easier than using rest api.

If you're using .NET/c#, there is a SDK for Azure Datalake Storage Gen2: Azure.Storage.Files.DataLake.

Here is the official doc for how to use this SDK to operate ADLS Gen2, and the c# code below is used to delete a file / upload a file for ADLS Gen2:

        static void Main(string[] args)
        {
            string accountName = "xxx";
            string accountKey = "xxx";

            StorageSharedKeyCredential sharedKeyCredential =
        new StorageSharedKeyCredential(accountName, accountKey);

            string dfsUri = "https://" + accountName + ".dfs.core.windows.net";

            DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClient
                (new Uri(dfsUri), sharedKeyCredential);

            DataLakeFileSystemClient fileSystemClient = dataLakeServiceClient.GetFileSystemClient("w22");
            DataLakeDirectoryClient directoryClient = fileSystemClient.GetDirectoryClient("t2");

            // use this line of code to delete a file
            //directoryClient.DeleteFile("22.txt");


            //use the code below to upload a file
            //DataLakeFileClient fileClient = directoryClient.CreateFile("22.txt");
            //FileStream fileStream = File.OpenRead("d:\\foo2.txt");

            //long fileSize = fileStream.Length;
            //fileClient.Append(fileStream, offset: 0);
            //fileClient.Flush(position: fileSize);

            Console.WriteLine("**completed**");
            Console.ReadLine();
        }

For Java, refer to this doc.

For Python, refer to this doc.