Spark SQL “No input paths specified in job”, but can printSchema

Question

I can read Json and printSchema, but running any actions fails (No input paths specified in job).

val sc = new org.apache.spark.SparkContext("local[*]", "shell")
val sqlCtx = new SQLContext(sc)
val input = sqlCtx.jsonFile("../data/tweets/")
input.printSchema

root
|-- contributorsIDs: array (nullable = true)
| |-- element: string (containsNull = true)
|-- createdAt: string (nullable = true)
...

input.first
java.io.IOException: No input paths specified in job

Folder structure looks like:

tweets
- tweets_1444576960000
  - _SUCCESS
  - part-00000
- tweets_1444577070000
  - _SUCCESS
  - part-00000

Notes:

I am using Spark and Spark SQL version 1.5.0
Executors are local[*] on same machine
I tried replacing the file path with absolute path. Same error
Json tweets were fetched using databrick's example app here

If you want to try recursively fetching directories, there seems to be a solution here. — Rohan Aletty

tabdulradi tabdulradi · Accepted Answer · 2015-10-11T16:52:05

Ok, problem solved by specifying the path like

val input = sqlCtx.jsonFile("../data/tweets/tweets_*/*")

Spark SQL “No input paths specified in job”, but can printSchema

1 Answers