0
votes

Trying to parse an XML BLOB and convert it into CSV. Able to use the following code when using a Local File.

import xml.etree.ElementTree as et

SourceFileName = req.params.get('FileName')
SourceContainer = "C:\\AzureInputFiles\\"
SourceFileFullPath = SourceContainer + SourceFileName

xtree = et.parse(SourceFileFullPath)
xroot = xtree.findall(".//data/record") 
df_cols=['Col1', 'Col2']
rows = []

Not able to use when working on Azure BLOB. How can I do that ? Not the cleanest but tried the following way by creating the URL with parameters. The Container is set for Public access and Blobs don't have restrictions. Library used : azure-storage-blob

import xml.etree.ElementTree as et

url = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}"

xtree = et.parse(url)
xroot = xtree.findall(".//data/record") 
df_cols=['Col1', 'Col2']
rows = []

Any Suggestion to make it work ? Better way to access Blob ?

1

1 Answers

0
votes

If you want to read xml file from Azure blob, we can use package azure.storage.blob to implement it.

For example

My xml file

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>
  1. Code
import xml.etree.ElementTree as ET

from azure.storage.blob import BlobServiceClient

connection_string='<your storage account connection string>'
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

blob_client = blob_service_client.get_blob_client(container="test", blob="data.xml")
downloader = blob_client.download_blob()
root = ET.fromstring(downloader.content_as_text())
for neighbor in root.iter('neighbor'):
    print(neighbor.attrib)

enter image description here