I am exploring DataBricks Delta table and its time travel / temporal feature. I have some events data that happened in the past. I am trying to insert them into delta table and be able to time travel using the timestamp in the data and not the actual insert time.
I have a date/time column in my event. I renamed it to "timestamp" but it still didn't work.
My csv data looks like this: (Data shows 5 updates that happened to a single case with id=1000)
account_id,id,case_id,case_status,owner,department,priority,created_at,timestamp
2114809,1000,ABC-121,1,200,300,1,2020-01-01T08:00:00.000000-07:00,2020-01-01T08:00:00.000000-07:00
2114809,1000,ABC-121,2,200,300,3,2020-01-01T08:00:00.000000-07:00,2020-01-01T09:00:00.000000-07:00
2114809,1000,ABC-121,3,200,300,3,2020-01-01T08:00:00.000000-07:00,2020-01-03T13:00:00.000000-07:00
2114809,1000,ABC-121,3,201,300,3,2020-01-01T08:00:00.000000-07:00,2020-01-12T14:00:00.000000-07:00
2114809,1000,ABC-121,4,201,300,3,2020-01-01T08:00:00.000000-07:00,2020-01-13T10:00:00.000000-07:00
I used these commands to create the delta table:
DROP TABLE IF EXISTS cases;
DROP TABLE IF EXISTS temp_cases;
CREATE TABLE temp_cases
USING csv
OPTIONS (path "/FileStore/tables/cases/test_cases.csv", header "true", inferSchema "true");
CREATE TABLE cases
USING delta
AS SELECT * FROM temp_cases;
DROP TABLE IF EXISTS temp_cases;
I have 2 questions:
- How can I do a data migration into delta table and still keep the timestamp from the data and not the inserted timestamp to perform time travel?
- How can I specify a primary key or a composite primary key when creating a delta table.