loading parquet file into vertica database using spark

Question

How to load a parquet file into vertica database using spark???

link (http://www.sparkexpert.com/2015/04/17/save-apache-spark-dataframe-to-database/)

I tried to load data frame(parquet files) using the above link into mysql it worked. But when i tried to load it into vertica database this is the error i am facing.The error below is because vertica db doesn’t support the datatypes(String) which is in the data frames(parquet file). I do not wanted to type cast the columns since its going to be a performance issue. we are looking to load around 280 million rows. Could you please suggest the best way to load the data into vertica db.

Exception in thread “main” java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](5108) ERROR: Type “TEXT” does not exist
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.io.ProtocolStream.readExpectedMessage(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepareImpl(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.(Unknown Source)
at com.vertica.jdbc.jdbc4.S4PreparedStatement.(Unknown Source)
at com.vertica.jdbc.VerticaJdbc4PreparedStatementImpl.(Unknown Source)
at com.vertica.jdbc.VJDBCObjectFactory.createPreparedStatement(Unknown Source)
at com.vertica.jdbc.common.SConnection.prepareStatement(Unknown Source)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:275)
at org.apache.spark.sql.DataFrame.createJDBCTable(DataFrame.scala:1611)
at com.sparkread.SparkVertica.JdbctoVertica.main(JdbctoVertica.java:51)
Caused by: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](5108) ERROR: Type “TEXT” does not exist
… 13 more

woot woot · Accepted Answer · 2015-10-31T19:11:18

Since you are getting the error on the createJDBCTable, you could just create the table yourself and use insertIntoJDBC instead.

Another idea would be to try and set spark.sql.dialect to Postgres since I noticed registerDialect(PostgresDialect) in spark. That said, I don't know how to do this other than to use jdbc:postgresql, but if you use that driver you would not get any advantage of a optimal insert that Vertica's JDBC driver would give you. You might need to modify here to allow it to use that dialect for jdbc:vertica. If for some reason that doesn't work you'd need to add in a new dialect.

Personally I think the first option is simpler.

loading parquet file into vertica database using spark

2 Answers