Spark-csv data source: infer data types

Question

I'm experimenting with Spark-CSV package (https://github.com/databricks/spark-csv) for reading csv files into Spark DataFrames.

Everything works but all columns are assumed to be of StringType.

As shown in Spark SQL documentation (https://spark.apache.org/docs/latest/sql-programming-guide.html), for built-in sources such as JSON, the schema with data types can be inferred automatically.

Can the types of columns in CSV file be inferred automatically?

1. StringTypes are a field type in SparkSQL. 2. What you are asking is not very clear, can you be more specific about what you are trying to achieve — eliasah
I'm asking about automatic type inference, which is available in built-in data sources such as JSON. I.e. if one creates df using sqlContext.jsonFile("...") from json file having say one integer and one string field - these types would be defined in schema. Is this possible with CSV data source format? — Oleg Shirokikh

Olga Olga · Accepted Answer · 2017-04-14T08:41:34

Starting from Spark 2 we can use option 'inferSchema' like this: getSparkSession().read().option("inferSchema", "true").csv("YOUR_CSV_PATH")