1
votes
  1. Create Spark dataframe selecting from an Impala table

    sql_df1 = hive_context.sql("SELECT * FROM database1.table1 LIMIT 10")
    

1.1 This dataframe returns row count 10 and shows correct data: sql_df1

    print(sql_df1.count())
    sql_df1.show()
  1. create a new table from the first Spark dataframe

    sql_df1.write.mode("overwrite").format("parquet").saveAsTable("database1.table2")

  2. Refresh Metadata in impala, In HUE i can see database1.table2 has 10 rows of correct data

  3. Create new Spark dataframe with the new table.

    sql_df2 = hive_context.sql("SELECT * FROM database1.table2 LIMIT 10")
    
  4. ISSUE: The new sql_df2 has no rows, only headers.

    print(sql_df2.count())
    sql_df2.show()
    
1
I found the problem, the format has to be "hive" and not parquet.Brian DS
Nice catch! I learnt something new today from you! Answer your question and accept.Jacek Laskowski

1 Answers

0
votes

I found the problem, the format has to be "hive" and not parquet.