1
votes

we like to create the dataframe on top of Hive external table and use the hive schema and data for the computation in spark level.

can we get the schema from the hive external table and use it as Dataframe schema.

5
I'm not completely sure but I think that it doesn't make any difference for Spark what kind of Hive table you have. As for schema, could you provide more details about data format you use in your Hive table?Dmitry Y.
We are having csv data files with out header and currently external file has created on this files , so we like to use the hive external table schema for creating the dataframes .venkata
Have you considered accpeting an answer?Raphael Roth
Have you thought about accepting an answer?Raphael Roth

5 Answers

8
votes

The hive-metastore knows the schema of your tables and passes this information to spark. It does not matter whether the table is external or not:

val df = sqlContext.table(tablename)

where sqlContext is of type HiveContext. You can verify your schema with

df.printSchema
2
votes

To access the Hive table from Spark use Spark HiveContext

import org.apache.spark.sql.hive.HiveContext;

val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
.
.
do other stuff..then
.
.

val data = sqlContext.sql("select * from hive_table");

here data will be your dataframe with schema of the Hive table.

1
votes

Spark with Hive enabled can do this out of the box. Please reference the docs.

val dataframe = spark.sql("SELECT * FROM table")
val schema = dataframe.schema
1
votes

Load the data in data frame

df=sqlContext.sql("select * from hive_table")

Get schema with structTypes

df.schema

Get column Names of the Hive Table

df.columns

Get the column names with datatypes

df.dtypes
0
votes

you can create dataframe with tour own column name into toDF()

df = spark.sql("select * from table").toDF(col1,col2)