When we drop a managed table , Hive deletes the data in the table is my understanding. By default, SAS data step option DBCREATE_EXTERNAL is set to NO which means SAS data step using hive libraries like below creates a “managed table”.
When using proc sql - drop table / proc delete / proc dataset - delete, the hive metadata is deleted i.e. table structure is dropped from the schema but the underlying HDFS file is not. While running the same data step again (after deletion), i,e. creating the same table in the schema - the number of records ingested is incorrect.
Steps
- Create a hive table using SAS data step and note the no of rows.
- Drop the table using proc sql delete / proc dataset delete.
- Run the create table step again.
- Count the number of rows.
Thanks.