2
votes

I read data from Oracle via spark JDBC connection to a DataFrame. I have a column which is obviously StringType in dataframe.

Now I want to persist this in Hive, but as datatype Varchar(5). I know the string would be truncated but it is ok.

I tried using UDFs which didn't work since dataframe does not have varchar or char types. I also created a temporary view in Hive using:

val tv = df.createOrReplaceTempView("t_name")
val df = spark.sql("select cast(col_name as varchar(5)) from tv")

But then when i printSchema, i still see a string type.

How can I make I save it as a varchar column in Hive table ?

1
There is a org.apache.spark.sql.types.VarcharType. Did you try using that ? - philantrovert
yes, it says string cannot be casted to VarcharType - drk

1 Answers

0
votes

Try creating Hive table("dbName.tableName") with required schema (varchar(5) in this case) and insert into the table directly from Dataframe like below.

df.write.insertInto("dbName.tableName" ,overwrite = False)