3
votes

I am trying to include the spark-avro package while starting spark-shell, as per the instructions mentioned here: https://github.com/databricks/spark-avro#with-spark-shell-or-spark-submit.

spark-shell --packages com.databricks:spark-avro_2.10:2.0.1

My intent is to convert the avro schema to spark schema type, using SchemaConverter class present in the package.

import com.databricks.spark.avro._ ... //colListDel is list of fields from avsc which are to be delted for some functional reason.

for( field <- colListDel){
 println(SchemaConverters.toSqlType(field.schema()).dataType);
}

...

On execution of above for loop, i get below error:

<console>:47: error: object SchemaConverters in package avro cannot be accessed in package com.databricks.spark.avro
            println(SchemaConverters.toSqlType(field.schema()).dataType);

Please suggest if there is anything I am missing or let me know how to include SchemaConverter in my scala code.

Below are my envt details: Spark version: 1.6.0 Cloudera VM 5.7

Thanks!

1
did you ever figure this out? i am running into the same error.user3809888
workaround : once the package command is executed, it downloads the jars in some hidden folder: .ivy2/jars folder. I used those jars in the classpath and wrote the scala custom code to use the classes from the package library. Seems internally schemaconverter is a private member and if you need you need to customise it, by checking the required license terms. let me know if this helps.hadooper
Probably the 2.0.1 was not build from most recent 2.0 branch. I noticed that this class used to be private: github.com/databricks/spark-avro/blob/branch-1.0/src/main/scala/…Piotr Reszke
I am using Spark 1.4.1 and I tried the code val sField = new StructField(f.name,SchemaConverters.toSqlType(f.schema()).dataType,false) and I found the below error: Spark symbol SchemaConverters is not accessible from this place did you find a solution for older versions? i am limited to version 1.4.1 in my workplacealsolh

1 Answers

1
votes

This object and the mentioned method used to be private. Please check the source code from version 1.0:

https://github.com/databricks/spark-avro/blob/branch-1.0/src/main/scala/com/databricks/spark/avro/SchemaConverters.scala

private object SchemaConverters {
  case class SchemaType(dataType: DataType, nullable: Boolean)
  /**
   * This function takes an avro schema and returns a sql schema.
   */
  private[avro] def toSqlType(avroSchema: Schema): SchemaType = {
    avroSchema.getType match {
    ...

You were downloading the 2.0.1 version which was probably not build from latest 2.0 branch. I checked the 3.0 version and this class and method are public now.

This should solve your problems:

spark-shell --packages com.databricks:spark-avro_2.10:3.0.0

EDIT: added after comment

The spark-avro 3.0.0 library requires Spark 2.0, so you can replace your current Spark with 2.0 version. The other option would be to contact databricks and ask them to build 2.0.2 version - from the latest 2.0 branch.