I am using Spark SQL (I mention that it is in Spark in case that affects the SQL syntax - I'm not familiar enough to be sure yet) and I have a table that I am trying to re-structure. I have an approach that works locally but when I try to run the same command on an AWS EC2 instance I get an error reporting that I have an 'unresolved operator'
Basically I have data that looks like:
userId someString varA
1 "example1" [0,2,5]
2 "example2" [1,20,5]
and I use an 'explode' command in an sqlContext on varA. When I run this locally things return correctly, but on AWS they fail.
I can reproduce this with the following commands:
val data = List(
("1", "example1", Array(0,2,5)), ("2", "example2", Array(1,20,5)))
val distData = sc.parallelize(data)
val distTable = distData.toDF("userId", "someString", "varA")
distTable.registerTempTable("distTable_tmp")
val temp1 = sqlContext.sql("select userId, someString, varA from distTable_tmp")
val temp2 = sqlContext.sql(
"select userId, someString, explode(varA) as varA from distTable_tmp")
Locally, temp1.show() and temp2.show() return what I'd expect, namely:
scala> temp1.show()
+------+----------+----------+
|userId|someString| varA|
+------+----------+----------+
| 1| example1| [0, 2, 5]|
| 2| example2|[1, 20, 5]|
+------+----------+----------+
scala> temp2.show()
+------+----------+----+
|userId|someString|varA|
+------+----------+----+
| 1| example1| 0|
| 1| example1| 2|
| 1| example1| 5|
| 2| example2| 1|
| 2| example2| 20|
| 2| example2| 5|
+------+----------+----+
but on AWS the temp1 sqlContext command works fine, but temp2 fails with the message:
scala> val temp2 = sqlContext.sql("select userId, someString, explode(varA) as varA from distTable_tmp")
15/11/05 22:46:49 INFO parse.ParseDriver: Parsing command: select userId, someString, explode(varA) as varA from distTable_tmp
15/11/05 22:46:49 INFO parse.ParseDriver: Parse Completed
org.apache.spark.sql.AnalysisException: unresolved operator 'Project [userId#3,someString#4,HiveGenericUdtf#org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode(varA#5) AS varA#6];
...
Many thanks.
sqlContext
? Also what version of Spark do you use? – zero323