2
votes

I am trying to read a csv from Azure blob into Python as a stream and write it back to Azure blob directly. Read operation works perfectly fine butwriting output stream just writes an empty file into the blob. The following code works until print(df) but not after that.

Below is the code:

Code:

from io import BytesIO, StringIO

with BytesIO() as input_blob:   

  with BytesIO() as output_blob:

    block_blob_service = BlockBlobService(account_name='aaaccc', account_key='*/*/*--')

    block_blob_service.get_blob_to_stream('test', 'Source.csv', input_blob)

    input_blob.seek(0)

    df=pd.read_csv(input_blob)

    print(df)

    copyfileobj(input_blob, output_blob)

    block_blob_service.create_blob_from_stream('test', 'OutFilePy.csv', output_blob)
1
what does copyfileobj function do? can you edit your question and paste the definition of that?Saher Ahwal
@Saher, copyfileobj copies the input stream to output stream.. I am trying to write the copied output_blob into blob storage but it just writes an empty fileAngiSen
I think the problem may be input_blob's cursor is at EOF after pd.read_csv. A input_blob.seek(0) after read_csv maybe helpful.Sraw
Fantastic Sraw, it works!! one last question... what if i have to make some operations on dataframe and write the modified dataframe as stream into blob?AngiSen
@Sraw You could move your comment to the answer to help more community members to find, thanks.Joy Wang-MSFT

1 Answers

4
votes

The problem is that after pd.read_csv, the cursor of input_blob is at EOF. So copyfileobj just copy nothing to output_blob.

You can just add an input_blob.seek(0) after read_csv to fix this problem.