0
votes

I'm new to Azure Databricks and I'm using it for a project.

As mentioned here in the documentation for Automatic termination it says

You can also set auto termination for a cluster. During cluster creation, you can specify an inactivity period in minutes after which you want the cluster to terminate. If the difference between the current time and the last command run on the cluster is more than the inactivity period specified, Azure Databricks automatically terminates that cluster.

Is there a workaround to get the real time inactivity period (the difference between the current time and the last command run on the cluster) of a cluster on the Azure Databricks Notebooks via the Cluster API or any other method?

1

1 Answers

0
votes
# Function to retrieve cluster inactivity time
from datetime import datetime
import time

def cluster_inactivity_time(log_file_path):
 
  # Open log4j-active.log and read last line
  with open(log_file_path, "r") as file:
    first_line = file.readline()
    for last_line in file:
        pass
      
  # Convert last lines' timestamp to milliseconds
  last_run_time = last_line[9:17]
  current_date = datetime.now().strftime('%Y-%m-%d')
  last_run_datetime = round(datetime.strptime(current_date + ' ' + last_run_time, "%Y-%m-%d %H:%M:%S").timestamp() * 1000)
  
  # Finding the difference between current time and last command run time
  current_time = round(time.time() * 1000)
  difference = current_time - last_run_datetime
  inactivity_time = datetime.fromtimestamp(difference / 1000.0)
  print(f'The Cluster has been Inactive for {inactivity_time.hour}:{inactivity_time.minute}:{inactivity_time.second}')


# Function Call
log_file_path = '/dbfs/cluster-logs/0809-101642-leap143/driver/log4j-active.log'
cluster_inactivity_time(log_file_path)

Output:

The Cluster has been Inactive for 0:0:35