We've run into an issue when migrating to the newer Spark version and totally not sure how to troubleshoot it.
We have two Spark instances, first of them is of version 2.3.0 and second - of version 2.4.0. Both instances receive the same command:
spark.sql("SELECT array_contains(array(1), '1')")
On older version, we get the following:
[{"array_contains(array(1), CAST(1 AS INT))":true}]
i.e. parameter is auto-casted to match the other one. On newer version, this is an error:
cannot resolve 'array_contains(array(1), '1')' due to data type mismatch: Input to function array_contains should have been array followed by a value with same element type, but it's [array<int>, string].; line 1 pos 7;
'Project [unresolvedalias(array_contains(array(1), 1), None)]
+- OneRowRelation
Since we don't explicitly control neither the real data types nor the SQL code passed to us (they are managed by our customers), we'd like to understand whether they have to change the data or we can work around this issue ourselves. Is there something we can do at the Spark side?
If there's something we should check other then the Spark versions, feel free to leave the comment, I'll add the necessary data to the question.