I have the following case class:
case class User(userId: String)
and the following schema:
+--------------------+------------------+
| col_name| data_type|
+--------------------+------------------+
| user_id| string|
+--------------------+------------------+
When I try to convert a DataFrame
to a typed Dataset[User]
with spark.read.table("MyTable").as[User]
, I get an error that the field names mismatch:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve ''`user_id`' given input columns: [userId];;
Is there any simple way to solve this without breaking scala idioms and naming my fields user_id
? Of course, my real table has a lot of more fields, and I have a lot more case classes / tables, so it's not feasible to manually define an Encoder
for each case class (And I don't know macros well-enough, so that's out of a question; though I'm happy to use one if such exists!).
I feel like I'm missing a very obvious "convert snake_case to camelCase=true" option, since one exists in practically any ORM I've worked with.
case class
fields using snake_case if they represent a spark table. – Gal