2
votes
How to load a parquet file into vertica database using spark???

link (http://www.sparkexpert.com/2015/04/17/save-apache-spark-dataframe-to-database/)

I tried to load data frame(parquet files) using the above link into mysql it worked. But when i tried to load it into vertica database this is the error i am facing.The error below is because vertica db doesn’t support the datatypes(String) which is in the data frames(parquet file). I do not wanted to type cast the columns since its going to be a performance issue. we are looking to load around 280 million rows. Could you please suggest the best way to load the data into vertica db.

Exception in thread “main” java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](5108) ERROR: Type “TEXT” does not exist
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.io.ProtocolStream.readExpectedMessage(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepareImpl(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.(Unknown Source)
at com.vertica.jdbc.jdbc4.S4PreparedStatement.(Unknown Source)
at com.vertica.jdbc.VerticaJdbc4PreparedStatementImpl.(Unknown Source)
at com.vertica.jdbc.VJDBCObjectFactory.createPreparedStatement(Unknown Source)
at com.vertica.jdbc.common.SConnection.prepareStatement(Unknown Source)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:275)
at org.apache.spark.sql.DataFrame.createJDBCTable(DataFrame.scala:1611)
at com.sparkread.SparkVertica.JdbctoVertica.main(JdbctoVertica.java:51)
Caused by: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](5108) ERROR: Type “TEXT” does not exist
… 13 more
2

2 Answers

1
votes

Since you are getting the error on the createJDBCTable, you could just create the table yourself and use insertIntoJDBC instead.

Another idea would be to try and set spark.sql.dialect to Postgres since I noticed registerDialect(PostgresDialect) in spark. That said, I don't know how to do this other than to use jdbc:postgresql, but if you use that driver you would not get any advantage of a optimal insert that Vertica's JDBC driver would give you. You might need to modify here to allow it to use that dialect for jdbc:vertica. If for some reason that doesn't work you'd need to add in a new dialect.

Personally I think the first option is simpler.

0
votes

When the Vertica table exists with the same column names as the dataFrame (and the corresponding types, VARCHAR) the following has worked for me (while keeping vertica's jdbc):

myDataFrame.write().mode(SaveMode.Append).jdbc(url, "MY_VERTICA_TABLE", new Properties());