I am using Spark 2.2.1 which has a useful option to specify how many records I want to save in each partition of a file; this feature allows to avoid a repartition before writing a file. However, it seems this option is usable only with the FileWriter interface and not with the DataFrameWriter one: in this way the option is ignored
df.write.mode("overwrite")
.option("maxRecordsPerFile", 10000)
.insertInto(hive_table)
while in this way it works
df.write.option("maxRecordsPerFile", 10000)
.mode("overwrite").orc(path_hive_table)
so I am directly writing orc files in the HiveMetastore folder of the specified table. The problem is that if I query the Hive table after the insertion, this data is not recognized by Hive. Do you know if there's a way to write directly partition files inside the hive metastore and make them available also through the Hive table?