0
votes

i have created data frame using below code:

  import pyspark
    from pyspark.sql import functions as F

    sc = pyspark.SparkContext()
    spark = pyspark.sql.SparkSession(sc)

    data = [('A', 'B', 1), ('A', 'B', 2), ('A', 'C', 1)]
    columns = ['Column1', 'Column2', 'Column3']
    data = spark.createDataFrame(data, columns)
    data.printSchema()
 root
 |-- Column1: string (nullable = true)
 |-- Column2: string (nullable = true)
 |-- Column3: long (nullable = true)

I want to create a hive table using my pySpark dataframe's schema in pyspark? here I have mentioned sample columns but I have many columns in my dataframe, so is there a way to automatically generate such query?

1

1 Answers

0
votes

I belive your table creation is an one time activity,in that case the data type might differ between spark and a Hive table.

The best what you can do in case if you have a lots of columns..

print(data.schema)

So that you will get all the schema