Spark read JDBC from SAS IOM

Question

I'm trying to read from SAS IOM using Spark JDBC. Problem is that the SAS JDBC driver is a bit strange, so i need to create my own dialect:

object SasDialect extends JdbcDialect {
  override def canHandle(url: String): Boolean = url.startsWith("jdbc:sasiom")
  override def quoteIdentifier(colName: String): String = "\"" + colName + "\"n"
}

however , this is not enough. SAS makes a distinction between column labels (= human readable names) and column names (= names you use in a SQL query), but it seems that spark uses the column labels instead of the names in schema discovery, see JdbcUtils extract below:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L293

while (i < ncols) {
  val columnName = rsmd.getColumnLabel(i + 1)

this results in SQL errors because it tries to use the human readable column name in generated SQL code.

For SAS IOM JDBC to work, this would need to be the getColumnName instead of the getColumnLabel. Is there a way to specify this in the dialect ? I could not really find a way to hook into this, apart from wrapping the whole com.sas.rio.MVADriver and the resultsetmeta

Frank

Frank Dekervel Frank Dekervel · Accepted Answer · 2020-07-17T15:27:16

In the mean time i found how to do it, so just posting for reference. The trick is to register your own dialect, like below.

Also, SAS pads all varchar columns with spaces, so i trim all string columns.

  def getSasTable(sparkSession: org.apache.spark.sql.SparkSession, tablename: String): org.apache.spark.sql.DataFrame = {                                       
    val host : String = "dwhid94.msnet.railb.be";                                                                                                               
    val port : String = "48593";                                                                                                                                
    val props = new java.util.Properties();                                                                                                                     
    props.put("user", CredentialsStore.getUsername("sas"))                                                                                                      
    props.put("password", CredentialsStore.getPassword("sas"))                                                                                                  
    props.setProperty("driver", "com.sas.rio.MVADriver")                                                                                                        
    val sasconurl : String =  String.format("jdbc:sasiom://%s:%s", host, port);                                                                                 
                                                                                                                                                                
    object SasDialect extends JdbcDialect {                                                                                                                     
      override def canHandle(url: String): Boolean = url.startsWith("jdbc:sasiom")                                                                              
      override def quoteIdentifier(colName: String): String = "\"" + colName + "\"n"                                                                            
    }                                                                                                                                                           
    JdbcDialects.registerDialect(SasDialect)                                                                                                                    
    val df = sparkSession.read                                                                                                                                  
      .option("url", sasconurl)                                                                                                                                 
      .option("driver", "com.sas.rio.MVADriver")                                                                                                                
      .option("dbtable", tablename)                                                                                                                             
      .option("user",CredentialsStore.getUsername("sas"))                                                                                                       
      .option("password",CredentialsStore.getPassword("sas"))                                                                                                   
      .option("fetchsize",100)                                                                                                                                  
      .format("jdbc")                                                                                                                                           
      .load()                                                                                                                                                   
                                                                                                                                                                
    val strippedDf = sparkSession.createDataFrame(df.rdd.map(r => Row(r.toSeq.map(x => x match {case s: String => s.trim; case _ => x}): _*)), df.schema);      
    return strippedDf;                                                                                                                                          
  }

Spark read JDBC from SAS IOM

1 Answers