How two times copy (staging copy + normal copy) in Azure synapse is more optimized then normal copy

Question

I have a use case where I should optimize copy activities in Azure where my data sources are not PolyBase compatible (for example: Oracle DB). Here, they say:

If your source data store and format isn't originally supported by PolyBase, use the Staged copy by using PolyBase feature instead. The staged copy feature also provides you better throughput. It automatically converts the data into PolyBase-compatible format, stores the data in Azure Blob storage, then calls PolyBase to load data into Azure Synapse Analytics.

What I am doing: copy from source (oracle for example) to sink (Azure synapse)
What Azure saying, as explained here: copy source (oracle for example) to staging, then copy from staging to sink (Azure synapse Analytics)

My question is: how the case 2 is more optimized (faster) then my case 1?

HimanshuSinha-msft HimanshuSinha-msft · Accepted Answer · 2021-02-03T20:00:13

The point being polybase is faster is implemented when we copy the data from staging to the sink , makes the whole operation faster . for the option 1 , bulkinsert is used . Writing to a blob is also always faster then writing to sink .

How two times copy (staging copy + normal copy) in Azure synapse is more optimized then normal copy

1 Answers