0
votes

I want to download a large csv file in a web application. The web application sends an api request to the service which then hits the Azure Data Lake Storage API using the ADLSClient in azure-storage-file-datalake. The sample code in the service is as follows:

val client = getADLSClientGen2(dataSourceInstanceName, fileSystem)
val fileClient = client.getFileClient(filePath)
val outputStream: OutputStream = ByteArrayOutputStream()
fileClient.read(outputStream)
outputStream.close()
val buffer = outputStream as ByteArrayOutputStream
return ByteArrayInputStream(buffer.toByteArray())

In the above code the entire file is read in the outputstream then the inputstream of it is sent as response in the api request. I want the ability to send the inputstream of the file read from the adls file system directlu

1

1 Answers

0
votes

Per my understanding, on your web application, you want to have an input stream for the .CSV file which stored in ADLS Gen2. As ADLS Gen2 is builted on Azure storage service, so on the server side, we can just a SAS token for your web application, and web app to make an HTTP request to download this file directly from ADLS Gen2 so that you can get this input stream from the HTTP response.

This is the code to generate a blob sas token:

        String connString = "<conntion string>";
        String containerName = "<container name>";
        String blobName = "<.csv name>";

        BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
        BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);

        BlobSasPermission blobSasPermission = new BlobSasPermission().setReadPermission(true); // grant read
                                                                                               // permission
                                                                                               // onmy
        OffsetDateTime expiryTime = OffsetDateTime.now().plusDays(1); // 1 day to expire
        BlobServiceSasSignatureValues values = new BlobServiceSasSignatureValues(expiryTime, blobSasPermission)
                        .setStartTime(OffsetDateTime.now());

        System.out.println(blobClient.getBlobUrl() + "?" + blobClient.generateSas(values));

maven dependency:

   <dependency>
      <groupId>com.azure</groupId>
      <artifactId>azure-storage-blob</artifactId>
      <version>12.9.0</version>
    </dependency>

Web application sample code:

<html>

    <body>
    
        
    
    
    </body>
    
    <script>
    
        var xhr = new XMLHttpRequest();
            xhr.open('GET', 'CVS file URL with sas');
            xhr.seenBytes = 0;

            xhr.onreadystatechange = function() {
              console.log("state change.. state: "+ xhr.readyState);

              if(xhr.readyState == 3) {
                var newData = xhr.response.substr(xhr.seenBytes);
                console.log("newData: <<" +newData+ ">>");
                document.body.innerHTML += "New data: " +newData+ "<br />";

                xhr.seenBytes = xhr.responseText.length;
                console.log("seenBytes: " +xhr.seenBytes);
              }
            };

            xhr.addEventListener("error", function(e) {
              console.log("error: " +e);
            });

            console.log(xhr);
            xhr.send();
    </script>

</html>

The CSV file content is loaded incrementally: enter image description here