1
votes

The spark documentation shows how a spark package can be added:

sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")

I believe this can only be used when initialising the session.

How can we add spark packages for SparkR using a notebook on DSX?

1

1 Answers

2
votes

Please use pixiedust package manager to install the avro package.

pixiedust.installPackage("com.databricks:spark-avro_2.11:3.0.0")

http://datascience.ibm.com/docs/content/analyze-data/Package-Manager.html

Install it from python 1.6 kernel since pixiedust is importable in python.(Remember this is install at your spark instance level). Once you install it , restart the kernel and then switch to R kernel and then read the avro like this:-

df1 <- read.df("episodes.avro", source = "com.databricks.spark.avro", header = "true")

head(df1)

Complete Notebook:-

https://github.com/charles2588/bluemixsparknotebooks/raw/master/R/sparkRPackageTest.ipynb

Thanks, Charles.