2
votes

I have been reading the Azure Databricks pricing details but I can't find if there is any difference between the costs depending if the cluster is running a spark app or not.

I have a 2-node cluster that I use to upload a Spark application that, on an hourly basis, calculates certain elements and stores the result in a Databricks table. The table needs to be accessed by an external BI application, so the cluster needs to be up. Assuming that the cluster is running for a whole hour but only executing the Spark application for 5 minutes, am I going to be charged differently for the 5 minute execution time than for the other 55 minutes?

Any help would be appreciated.

2

2 Answers

2
votes

Note: Azure Databricks clusters are billed based on "VM cost + DBU cost" and not based on runtime for the Spark application or any notebooks runs or jobs.

Your case: If you run Premium tier cluster for 1 hour in East US 2 with 2 DS13v2 instances, the billing would be the following for Data Analytics workload:

  • VM cost for 2 DS13v2 instances —1-hour x 2 instances x $0.598/hour = $1.196
  • DBU cost for Data Analytics workload for 2 DS13v2 instances —1-hour x 2 instances x 2 DBU per node x $0.55/DBU = $2.20
  • The total cost would therefore be $1.196 (VM Cost) + $2.20 (DBU Cost) = $3.396.

If my cluster ran for less than an hour, how much will I get billed?

We charge for the number of minutes your cluster is running rounded to the nearest minute, not hour.

enter image description here

Could you give me an example on how billing works?

Depending on the type of workload your cluster runs, you will either be charged for Data Engineering or Data Analytics workload. For example, if the cluster runs workloads triggered by the Databricks jobs scheduler, you will be charged for the Data Engineering workload. If your cluster runs interactive features such as ad-hoc commands, you will be billed for Data Analytics workload.

Case1: If you run Premium tier cluster for 100 hours in East US 2 with 10 DS13v2 instances, the billing would be the following for Data Analytics workload:

  • VM cost for 10 DS13v2 instances —100 hours x 10 instances x $0.598/hour = $598

  • DBU cost for Data Analytics workload for 10 DS13v2 instances —100 hours x 10 instances x 2 DBU per node x $0.55/DBU = $1,100

  • The total cost would therefore be $598 (VM Cost) + $1,100 (DBU Cost) = $1,698.

Case2: If you run Premium tier cluster for 100 hours in East US 2 with 10 DS13v2 instances, the billing would be the following for Data Engineering workload:

  • VM cost for 10 DS13v2 instances —100 hours x 10 instances x $0.598/hour = $598

  • DBU cost for Data Engineering workload for 10 DS13v2 instances —100 hours x 10 instances x 2 DBU per node x $0.30/DBU = $600

  • The total cost would therefore be $598 (VM Cost) + $600 (DBU Cost) = $1,198.

Case3: If you run Premium tier cluster for 100 hours in East US 2 with 10 DS13v2 instances, the billing would be the following for Data Engineering Light workload:

  • VM cost for 10 DS13v2 instances —100 hours x 10 instances x $0.598/hour = $598

  • DBU cost for Data Engineering Light workload for 10 DS13v2 instances —100 hours x 10 instances x 2 DBU per node x $0.22/DBU = $440

  • The total cost would therefore be $598 (VM Cost) + $440 (DBU Cost) = $1,038.

In addition to VM and DBU charges, you may also be charged for managed disks, public IP address, or any other resource such as Azure Storage, Azure Cosmos DB depending on your application.

1
votes

Not really a question for here,

  • but assuming you are using 'pay-as-go' option and not 'reserved instances',
    • you will be charged the whole hour for compute resources & any ephemeral storage.

Any saved to storage is paid for continuously, but is pretty cheap. Like AWS, 'managed services' are more costly.