I can read Json and printSchema
, but running any actions fails (No input paths specified in job).
val sc = new org.apache.spark.SparkContext("local[*]", "shell")
val sqlCtx = new SQLContext(sc)
val input = sqlCtx.jsonFile("../data/tweets/")
input.printSchema
root
|-- contributorsIDs: array (nullable = true)
| |-- element: string (containsNull = true)
|-- createdAt: string (nullable = true)
...
input.first
java.io.IOException: No input paths specified in job
Folder structure looks like:
- tweets
- tweets_1444576960000
- _SUCCESS
- part-00000
- tweets_1444577070000
- _SUCCESS
- part-00000
- tweets_1444576960000
Notes:
- I am using Spark and Spark SQL version 1.5.0
- Executors are
local[*]
on same machine - I tried replacing the file path with absolute path. Same error
- Json tweets were fetched using databrick's example app here