0
votes

In Hive, if we call the limit clause it will give a faster response. Same thing if we run in Spark SQL it is taking more time. Could you please explain in depth?

In Hive

SELECT * FROM employee LIMIT 10;

In Spark SQL,

spark.sql("SELECT * FROM employee LIMIT 10").show()

How limit query will work for partitioned table?

1
Can you tell me what's is the file format in both the cases. - code.gsoni
file format is parquet - Ranga Reddy
Which Hive and Spark version are you using? And which execution engine your are using while running same query in hive? - Vijay_Shinde
If we run any spark/hive version, for select * from table limit 10, hive will give better performance because hive will directly run from hdfs files. - Ranga Reddy
What i want is internal working of spark and hive while using limit query. - Ranga Reddy

1 Answers

0
votes

Because Spark SQL is not being developed from scratch, they have taken the Hive as it is and integrated this with spark. Now when you run the query using Hive so it's native to hive and all the Serialization and deserialization library used of Hive only but in case of spark it will use Java serds which have some overheads.