I have a working notebook at azure databricks version 6.3 - Spark 2.4.4
This notebook does ingestions into Azure Synapse Analytics using it's connector
When I upgraded the notebook to version 7.0 - Spark 3.0.0, the process begun to fail with the following error:
com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap') [ErrorCode = 106000] [SQLState = S0001]
This is the table schema in the Synapse Analytics:
CREATE TABLE [dbo].[IncrementalDestination]
(
[Id] [int] NOT NULL,
[VarChar] [varchar](1000) NULL,
[Char] [char](1000) NULL,
[Text] [varchar](1000) NULL,
[NVarChar] [nvarchar](1000) NULL,
[NChar] [nchar](1000) NULL,
[NText] [nvarchar](1000) NULL,
[Date] [date] NULL,
[Datetime] [datetime] NULL,
[Datetime2] [datetime2](7) NULL,
[Smalldatetime] [smalldatetime] NULL,
[Bigint] [bigint] NULL,
[Bit] [bit] NULL,
[Decimal] [decimal](18, 0) NULL,
[Int] [int] NULL,
[Money] [money] NULL,
[Numeric] [numeric](18, 0) NULL,
[Smallint] [smallint] NULL,
[Smallmoney] [smallmoney] NULL,
[Tinyint] [tinyint] NULL,
[Float] [float] NULL,
[Real] [real] NULL,
[Column With Space] [varchar](1000) NULL,
[Column_ç_$pecial_char] [varchar](1000) NULL,
[InsertionDateUTC] [datetime] NOT NULL,
[De_LastUpdated] [datetime2](3) NOT NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
GO
This is the schema generated by Databricks after reading a bunch of parquets in the Azure BlobStorage
root
|-- Id: long (nullable = true)
|-- VarChar: string (nullable = true)
|-- Char: string (nullable = true)
|-- Text: string (nullable = true)
|-- NVarChar: string (nullable = true)
|-- NChar: string (nullable = true)
|-- NText: string (nullable = true)
|-- Date: timestamp (nullable = true)
|-- Datetime: timestamp (nullable = true)
|-- Datetime2: timestamp (nullable = true)
|-- Smalldatetime: timestamp (nullable = true)
|-- Bigint: long (nullable = true)
|-- Bit: boolean (nullable = true)
|-- Decimal: long (nullable = true)
|-- Int: long (nullable = true)
|-- Money: double (nullable = true)
|-- Numeric: long (nullable = true)
|-- Smallint: long (nullable = true)
|-- Smallmoney: double (nullable = true)
|-- Tinyint: long (nullable = true)
|-- Float: double (nullable = true)
|-- Real: double (nullable = true)
|-- Column_With_Space: string (nullable = true)
|-- Column_ç_$pecial_char: string (nullable = true)
|-- InsertionDateUTC: timestamp (nullable = true)
|-- De_LastUpdated: timestamp (nullable = false)
I saw this
Int: long (nullable = true)
But what can I do?
Shouldn't this conversion be natural and easily done?
I think something broke with these new features =]