0
votes

I am trying to store dataframe into an external hive table. When I perform the following action:

 recordDF.write.option("path", "hdfs://quickstart.cloudera:8020/user/cloudera/hadoop/hive/warehouse/VerizonProduct").saveAsTable("productstoreHTable")

At the hdfs location where the table was supposed to be present instead I get this:

-rw-r--r-- 3 cloudera cloudera 0 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/_SUCCESS

-rw-r--r-- 3 cloudera cloudera 482 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/part-r-00000-0acdcc6d-893b-4e9d-b1d6-50bf02bea96a.snappy.parquet

-rw-r--r-- 3 cloudera cloudera 482 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/part-r-00001-0acdcc6d-893b-4e9d-b1d6-50bf02bea96a.snappy.parquet

-rw-r--r-- 3 cloudera cloudera 482 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/part-r-00002-0acdcc6d-893b-4e9d-b1d6-50bf02bea96a.snappy.parquet

-rw-r--r-- 3 cloudera cloudera 482 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/part-r-00003-0acdcc6d-893b-4e9d-b1d6-50bf02bea96a.snappy.parquet

How do I store it as uncompressed text format?

Thanks

2

2 Answers

1
votes

You can add format option:

recordDF.write.option("path", "...").format("text").saveAsTable("...")

or

recordDF.write.option("path", "...").format("csv").saveAsTable("...")
1
votes

The above solution with format csv, threw a warning "Couldn't find corresponding Hive SerDe for data source provider csv.". The table is not created in the desired way. One solution could be create an external table as below sqlContext.sql("CREATE EXTERNAL TABLE test(col1 int,col2 string) STORED AS TEXTFILE LOCATION '/path/in/hdfs'") .

Then dataFrame.write.format("com.databricks.spark.csv").option("header", "true").save("/path/in/hdfs")