we like to create the dataframe on top of Hive external table and use the hive schema and data for the computation in spark level.
can we get the schema from the hive external table and use it as Dataframe schema.
we like to create the dataframe on top of Hive external table and use the hive schema and data for the computation in spark level.
can we get the schema from the hive external table and use it as Dataframe schema.
To access the Hive table from Spark use Spark HiveContext
import org.apache.spark.sql.hive.HiveContext;
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
.
.
do other stuff..then
.
.
val data = sqlContext.sql("select * from hive_table");
here data
will be your dataframe with schema of the Hive table.
Spark with Hive enabled can do this out of the box. Please reference the docs.
val dataframe = spark.sql("SELECT * FROM table")
val schema = dataframe.schema