0
votes

Our powershell test harness calls databricks, which generates parquet files in azure storage.

When the harness attempts to clean up parquets (and other files) after a testrun, it searches for all blobs (in given locations) and removes them. Visibly, the blobs can't be seen any longer when looking in the azure portal after cleanup, but when the script runs again, it finds an increasing number of available blobs to delete, following each testrun.

Is this a case of some soft/hard delete policy?

I'm not specifying the snapshot parameter when deleting blobs, as I'm not interested to retain snapshots.

Some of the code used is:

$availableBlobs = Get-AzStorageBlob -Container $remoteContainer -Context $ctx

$ctx = GetStorageContext -storageaccountName $remoteStorageAccount -storageaccountkey $remoteStorageKey

$availableBlobs = Get-AzStorageBlob -Container $remoteContainer -Context $ctx

Remove-AzStorageBlob -Container $using:remoteContainer -Blob $blob.Name -Force -Context $using:ctx -ErrorAction SilentlyContinue

Why might the remove-AzStorageBlob seem to full remove a blob - such that the blob is no longer visiable, but seems to add to the increasing blob count found when the cleanup script next runs?

Additional Information After removing Remove -ErrorAction SilentlyContinue from remote-AzStorageBlob, it seems that part way through deleting all the $availableBlobs, an error 500 is seen:

enter image description here

The error does not appear at the same point, on subsequent attempts to run the same code.

1
I would output $availableBlobs just so that you can see what the script is trying to delete. i.e. is it the same blobs? (usually a caching issue, may have to refresh the Storage Context before re-running) is it the same blobs twice? (i.e. blobs += blobs because of caching). Remove -ErrorAction SilentlyContinue from the remove, and you may get the error message that the "blob does not exist" i.e. the blobs are gone and its a caching issue, and you simply are not seeing it because you are hiding the error message.HAL9256
Thanks Hal, will try that...david
@HAL9256 An error 500 is seen (added further info about this to the main post) could it be Azure see's us as spamming requests to delete blobs?david

1 Answers

1
votes

You likely are hitting the underlying API request limit. While they don't explicitly have a number for "delete" requests, if we assume it is like a "write" request, the API is throttled to 10 requests per second (Storage resource provider limits). A script like this could easily exceed that.

Simply add a Start-Sleep -Milliseconds 100 statement or equivalent to slow the requests down for the API to handle.