Delete redshift table from within databricks using pyspark

Question

I tried to connect to a redshift system table called stv_sessions and I can read the data into a dataframe.

This stv_sessions table is a redshift system table which has the process id's of all the queries that are currently running.

To delete a query from running we can do this.

select pg_terminate_backend(pid)

While this works for me if I directly connect to redshift (using aginity), it gives me insuffecient previlege issues when trying to run from databricks.

Simply put I dont know how to run the query from databricks notebook.

I have tried this so far,

kill_query = "select pg_terminate_backend('12345')" 

some_random_df_i_created.write.format("com.databricks.spark.redshift").option("url",redshift_url).option("dbtable","stv_sessions").option("tempdir", temp_dir_loc).option("forward_spark_s3_credentials", True).options("preactions", kill_query).mode("append").save()

Please let me know if the methodology i follow is correct.

Thank you

murtihash murtihash · Accepted Answer · 2020-02-07T20:57:55

Databricks purposely does not preinclude this driver. You need to Download and install the offical Redshift JDBC driver for databricks. : download the official Amazon Redshift JDBC driver, upload it to Databricks, and attach the library to your cluster.(recommend using v1.2.12 or lower with Databricks clusters). Then, use JDBC URLs of the form

val jdbcUsername = "REPLACE_WITH_YOUR_USER"
val jdbcPassword = "REPLACE_WITH_YOUR_PASSWORD"
val jdbcHostname = "REPLACE_WITH_YOUR_REDSHIFT_HOST"
val jdbcPort = 5439
val jdbcDatabase = "REPLACE_WITH_DATABASE"
val jdbcUrl = s"jdbc:redshift://${jdbcHostname}:${jdbcPort}/${jdbcDatabase}?user=${jdbcUsername}&password=${jdbcPassword}"
jdbcUsername: String = REPLACE_WITH_YOUR_USER
jdbcPassword: String = REPLACE_WITH_YOUR_PASSWORD
jdbcHostname: String = REPLACE_WITH_YOUR_REDSHIFT_HOST
jdbcPort: Int = 5439
jdbcDatabase: String = REPLACE_WITH_DATABASE
jdbcUrl: String = jdbc:redshift://REPLACE_WITH_YOUR_REDSHIFT_HOST:5439/REPLACE_WITH_DATABASE?user=REPLACE_WITH_YOUR_USER&password=REPLACE_WITH_YOUR_PASSWORD

Then try putting jdbcUrl in place of your redshift_url. That may be the only reason you are getting privilege issues.

Link1:https://docs.databricks.com/_static/notebooks/redshift.html Link2:https://docs.databricks.com/data/data-sources/aws/amazon-redshift.html#installation

Another reason could be the redshift-databricks connector only uses SSL(encryption in flight) and it is possible that IAM roles may have been set on your redshift cluster to only allow some users to delete tables.

Apologies if none of this helps your case.

Delete redshift table from within databricks using pyspark

1 Answers