7
votes

I was trying to run the below code in pyspark.

dbutils.widgets.text('config', '', 'config')

It was throwing me an error saying

 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 NameError: name 'dbutils' is not defined

so, Is there any way I can run it in pyspark by including the databricks package ,like an import ?

Your help is appreciated

4
In a package/module I have from pyspark.dbutils import DBUtils and def get_secerts(dbutils: DBUtils): Then you can use dbutils.secrets.get() as you would in a notebook. - Jari Turkia

4 Answers

7
votes

as explained in https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html#access-dbutils

depending on where you are executing your code directly on databricks server (eg. using databricks notebook to invoke your project egg file) or from your IDE using databricks-connect you should initialize dbutils as below. (where spark is your SparkSession)

def get_dbutils(spark):
    try:
        from pyspark.dbutils import DBUtils
        dbutils = DBUtils(spark)
    except ImportError:
        import IPython
        dbutils = IPython.get_ipython().user_ns["dbutils"]
    return dbutils

dbutils = get_dbutils(spark)
3
votes

As of databricks runtime v3.0 the answer provided by pprasad009 above no longer works. Now use the following:

def get_db_utils(spark):

      dbutils = None
      
      if spark.conf.get("spark.databricks.service.client.enabled") == "true":
        
        from pyspark.dbutils import DBUtils
        dbutils = DBUtils(spark)
      
      else:
        
        import IPython
        dbutils = IPython.get_ipython().user_ns["dbutils"]
      
      return dbutils

See: https://docs.microsoft.com/en-gb/azure/databricks/dev-tools/databricks-connect#access-dbutils

0
votes

In Scala you can

import com.databricks.dbutils_v1.DBUtilsHolder.dbutils

And follow below links for more dependency..

https://docs.databricks.com/user-guide/dev-tools/dbutils.html

-2
votes

I am assuming that you want the code to be run on databricks cluster. If so, then there is no need to import any package as Databricks by default includes all the necessary libraries for dbutils.

I tried using it on databricks (python/scala) notebook without importing any libraries and it works fine.

enter image description here