1
votes

I have hive warehouse in HDFS hdfs://localhost:8020/user/hive/warehouse.

I have a database mydb inside hdfs like hdfs://localhost:8020/user/hive/warehouse/mydb.db

How can I create a table & insert data into it using Pyspark

Please suggest

1

1 Answers

0
votes

Using hive context you will be able to create the table in Hive, Please see the below code to acheive that.

import findspark
findspark.init()
import pyspark
from pyspark.sql import HiveContext

//hivecontext
sqlCtx= HiveContext(sc)

//Loading a csv file into dataframe
spark_df = sqlCtx.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load("./data/documents_topics.csv")

//registering temp table
spark_df.registerTempTable("TABLE_Y")

//Creating table out of an existing temp created from data frame table
sqlCtx.sql("CREATE TABLE TABLE_X AS SELECT * from TABLE_Y")

//creating a brand new table in Hive
sqlCtx.sql("CREATE TABLE SomeSchema.TABLE_X (customername string, id string, ts timestamp) STORED AS DESIREDFORMAT")

Hope you can understand with the comments in the code, let me know if you ran into issues.