so I have a really huge table contains billions of rows, I tried the Spark DataFrame API to load data, here is my code:
sql = "select * from mytable where day = 2016-11-25 and hour = 10"
df = sqlContext.read \
.format("jdbc") \
.option("driver", driver) \
.option("url", url) \
.option("user", user) \
.option("password", password) \
.option("dbtable", table) \
.load(sql)
df.show()
I typed the sql in mysql, it returns about 100 rows,but the above sql did not work in spark sql, it occurs OOM error, it seems like that spark sql load all data into memory without using where clause. So how can spark sql using where clause?