2
votes

I am using the below code to query a sql server table hr.employee in my azure sql server database using azure databricks. I am new to spark sql and trying to learn the nuances one step at a time.

Azure Databricks:

%scala
val jdbcHostname = dbutils.widgets.get("hostName")
val jdbcPort = 1433
val jdbcDatabase = dbutils.widgets.get("database")
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

import java.util.Properties
val connectionProperties = new Properties()
 
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")

%scala
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)

%scala
val employee = spark.read.jdbc(jdbcUrl, "hr.Employee", connectionProperties)

%scala
spark.sql("select * from employee") 

%sql
select * from employee

employee.select("col1","col2").show()

I get the below error. Not sure what wrong am I doing. Tried a couple of variations as well and no luck so far.

Error:

';' expected but integer literal found.

command-922779590419509:26: error: not found: value %
%sql

command-922779590419509:27: error: not found: value select
select * from employee

command-922779590419509:27: error: not found: value from
select * from employee

command-922779590419509:16: error: not found: value %
%scala
1
use the spark.sql(query)?Lamanus
Hi @Lamanus, I tried spark.sql(select * from ommp) and get error. command-922779590419509:26: error: not found: value select spark.sql(select * from ommp) command-922779590419509:26: error: not found: value from spark.sql(select * from ommp) command-922779590419509:16: error: not found: value % %scalaITHelpGuy
nope, it should be a string, %scala spark.sql("select * from employee")Lamanus
Hi @Lamanus Tried with %scala and get the same error.ITHelpGuy
val dataDF = { val query = "select * from hr.Employee" sqlContext.sql(query) }; This doesn't work either.ITHelpGuy

1 Answers

2
votes

You may checkout the below steps to queries SQL Server using its JDBC driver:

Note: For the entire tutorial I'm using "Scala" source code.

Step 1: Check that the JDBC driver is available

Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")

Step 2: Create the JDBC URL

val jdbcHostname = "<hostname>"
val jdbcPort = 1433
val jdbcDatabase = "<database>"

// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()

connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")

Step 3: Check connectivity to the SQLServer database

val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)

Step4: Read data from JDBC

val employees_table = spark.read.jdbc(jdbcUrl, "employees", connectionProperties)

Step5: Reads the schema from the database table

employees_table.printSchema

Step6: You can run queries against this JDBC table:

display(employees_table.select("age", "salary").groupBy("age").avg("salary"))

Here is the screenshot for you reference:

enter image description here

enter image description here

enter image description here

=======> Updated Answer <===============

To use SQL Query:

Step1: To create a global table from a DataFrame in Python or Scala:

dataFrame.write.saveAsTable("<table-name>")

enter image description here

** Step2:** Now run the SQL query

spark.sql("select * from employees_table")

%sql 

select * from employees_table

enter image description here