I am doing a POC for Spark application in scala in LOCAL mode. I need to process a json dataset, with 300 columns but only fewer records. We are using Spark SQL and our program runs perfectly fine for 30 - 40 columns in the dataset. We are doing inner joins and outer joins using Spark SQL and other conditions in Where clause. The problem is the SQL is not executing for 300 columns join, its just stuck. Not sure how to analyze the SQL. Is there is solution to this problem without having to run it in distributed mode? Would doing in inner join on the dfs alleviate the problem. Something like this, df1.join(df2, col("id1") == col("id2"), "inner").
Thanks