Writing from Databricks to Synapse (Azure DW) very slow

Question

We are using Databricks and its SQL DW connector to load data into Synapse. I have a dataset with 10 000 rows and 40 columns. It takes 7 minutes!

Loading same dataset using Data Factory with Polybase and staging option takes 27 seconds. Same with bulk copy.

What could be wrong? Am I missing some configuration? Or is this business as usual?

Connection configuration:

df_insert.write .format("com.databricks.spark.sqldw") .option("url", sqlDwUrlSmall) .option("dbtable", t_insert) .option( "forward_spark_azure_storage_credentials","True") .option("tempdir", tempDir) .option("maxStrLength", maxStrLength) .mode("append") .save()

What DWU are you using? What resource class is associated with the user doing the load? — wBob
We are using the smallest DWU, DW100c. Databricks user is in largerc resource class, Data Factory user in default smallrc. I just noticed from documentation, that there is no difference between these resource classes on our DWU level. Both get 25 %. One additional note, reads from Synapse with Databricks are really fast. — Tero Kruth
100 is the lowest DWU and not really suitable for much. Try higher ones, 400, 1000 to see if you get a different result. If not, come back. — wBob
I would like to point out, that with same database level I get totally different performance results depending on if I use Data Factory or Databricks. So adjusting database level is not an answer I'm looking for. — Tero Kruth
@tero: I am facing the exact same problem...if you found a solution could you please post it as an answer — sab

zinco zinco · Accepted Answer · 2020-10-04T14:09:08

You can try to change the write semantics: Databricks documentation

Using the copy write semantics I was able to load data in Synapse faster.

You can configure it before running the write command, in this way:

spark.conf.set("spark.databricks.sqldw.writeSemantics", "copy")

Writing from Databricks to Synapse (Azure DW) very slow

1 Answers