I'm experimenting with Spark-CSV package (https://github.com/databricks/spark-csv) for reading csv
files into Spark DataFrames
.
Everything works but all columns are assumed to be of StringType
.
As shown in Spark SQL documentation (https://spark.apache.org/docs/latest/sql-programming-guide.html), for built-in sources such as JSON, the schema with data types can be inferred automatically.
Can the types of columns in CSV file be inferred automatically?
sqlContext.jsonFile("...")
from json file having say one integer and one string field - these types would be defined in schema. Is this possible with CSV data source format? – Oleg Shirokikh