Can I change the datatype of the Spark dataframe columns that is being loaded to SQL Server as a table?

Question

I am trying to read a Parquet file from Azure Data Lake using the following Pyspark code.

df= sqlContext.read.format("parquet")
   .option("header", "true")
   .option("inferSchema", "true")
   .load("adl://xyz/abc.parquet")
df = df['Id','IsDeleted']

Now I would like to load this dataframe df as a table in sql dataware house using the following code:

df.write \
  .format("com.databricks.spark.sqldw") \
  .mode('overwrite') \
  .option("url", sqlDwUrlSmall) \
  .option("forward_spark_azure_storage_credentials", "true") \
  .option("dbtable", "test111") \
  .option("tempdir", tempDir) \
  .save()

This creates a table dbo.test111 in the SQL Datawarehouse with datatypes:

Id(nvarchar(256),null)
IsDeleted(bit,null)

But I need these columns with different datatypes say char(255), varchar(128) in SQL Datawarehouse. How do I do this while loading the dataframe into SQL Dataware house?

Leon Yue Leon Yue · Accepted Answer · 2019-01-16T03:09:10

I found a way can help you modify the column data type, but maybe could not achieve your want.

df.select(col("colname").cast(DataType))

Here is a blob about How to change column types in Spark SQL's DataFrame.

Maybe this can helps you.

Can I change the datatype of the Spark dataframe columns that is being loaded to SQL Server as a table?

2 Answers