0
votes

I'm using spark SQL to pull tables from an Oracle database, some of them fairly sizable, into Azure databricks as tables so I can run jobs on them and leave them visible for the team to use. I need to pull them daily as the data changes and gets updated and I also want to source them by passing through my redacted credentials using Azure Key Vault / Secrets.

I have secrets set up and I can run a process to do all this in Python, however, for the larger tables it is too slow when writing the table from the data frame.

I know I can do this faster in spark SQL using the below, and can successfully pull almost all the tables in a fraction of the time than using python. However, I can't seem to be able to set it up to source and pass through the secrets for the username and password.

%sql

CREATE TABLE <table-out>
USING org.apache.spark.sql.jdbc
OPTIONS (
  dbtable '<table-source>',
  driver 'oracle.jdbc.driver.OracleDriver',
  user '<username>',
  password '<password>',
  url 'jdbc:oracle:thin:@//<server>:<port>');

In Python I would have obtained the secrets using dbutils as below:

%python
jdbcUsername = dbutils.secrets.get(scope="jdbc", key="<key>")

Is there an equivalent way I can do this in the SQL approach above. I realise I will still need to optimise any tables but will cross that bridge when I get to it.

Any help would be appreciated.

Thanks, c2

1

1 Answers

1
votes

So I managed to do this by submitting the SQL statement as a string into a spark.sql execution:

sqlQry = '''
CREATE TABLE IF NOT EXISTS {4}{1}
USING org.apache.spark.sql.jdbc
OPTIONS (
  driver 'oracle.jdbc.driver.OracleDriver',
  url '{0}',
  dbtable '{1}',
  user '{2}',
  password '{3}')'''.format(jdbcUrl, line, jdbcUsername, jdbcPassword, dbloc)

spark.sql(sqlQry)