I am trying to learn about deleting duplicate records from a Hive table.
My Hive table: 'dynpart' with columns: Id, Name, Technology
Id Name Technology
1 Abcd Hadoop
2 Efgh Java
3 Ijkl MainFrames
2 Efgh Java
We have options like 'Distinct' to use in a select query, but a select query just retrieves data from the table. Could anyone tell how to use a delete query to remove the duplicate rows from a Hive table.
Sure that it is not recommended or not the standard to Delete/Update records in Hive. But I want to learn how do we do it.