Why does saving dataset to Impala and loading it back yield no rows?

Question

Create Spark dataframe selecting from an Impala table

sql_df1 = hive_context.sql("SELECT * FROM database1.table1 LIMIT 10")

1.1 This dataframe returns row count 10 and shows correct data: sql_df1

    print(sql_df1.count())
    sql_df1.show()

create a new table from the first Spark dataframe

sql_df1.write.mode("overwrite").format("parquet").saveAsTable("database1.table2")
Refresh Metadata in impala, In HUE i can see database1.table2 has 10 rows of correct data

Create new Spark dataframe with the new table.

sql_df2 = hive_context.sql("SELECT * FROM database1.table2 LIMIT 10")

ISSUE: The new sql_df2 has no rows, only headers.
```
print(sql_df2.count())
sql_df2.show()
```

I found the problem, the format has to be "hive" and not parquet. — Brian DS
Nice catch! I learnt something new today from you! Answer your question and accept. — Jacek Laskowski

Brian DS Brian DS · Accepted Answer · 2017-12-13T18:59:54

0

votes

I found the problem, the format has to be "hive" and not parquet.