1
votes

I am exploring DataBricks Delta table and its time travel / temporal feature. I have some events data that happened in the past. I am trying to insert them into delta table and be able to time travel using the timestamp in the data and not the actual insert time.

I have a date/time column in my event. I renamed it to "timestamp" but it still didn't work.

My csv data looks like this: (Data shows 5 updates that happened to a single case with id=1000)

account_id,id,case_id,case_status,owner,department,priority,created_at,timestamp
2114809,1000,ABC-121,1,200,300,1,2020-01-01T08:00:00.000000-07:00,2020-01-01T08:00:00.000000-07:00
2114809,1000,ABC-121,2,200,300,3,2020-01-01T08:00:00.000000-07:00,2020-01-01T09:00:00.000000-07:00
2114809,1000,ABC-121,3,200,300,3,2020-01-01T08:00:00.000000-07:00,2020-01-03T13:00:00.000000-07:00
2114809,1000,ABC-121,3,201,300,3,2020-01-01T08:00:00.000000-07:00,2020-01-12T14:00:00.000000-07:00
2114809,1000,ABC-121,4,201,300,3,2020-01-01T08:00:00.000000-07:00,2020-01-13T10:00:00.000000-07:00

I used these commands to create the delta table:

DROP TABLE IF EXISTS cases;
DROP TABLE IF EXISTS temp_cases;

CREATE TABLE temp_cases
USING csv
OPTIONS (path "/FileStore/tables/cases/test_cases.csv", header "true", inferSchema "true");

CREATE TABLE cases
USING delta
AS SELECT * FROM temp_cases;

DROP TABLE IF EXISTS temp_cases;

I have 2 questions:

  1. How can I do a data migration into delta table and still keep the timestamp from the data and not the inserted timestamp to perform time travel?
  2. How can I specify a primary key or a composite primary key when creating a delta table.
1

1 Answers

1
votes

There might be a minor misunderstanding of the use case of time travel. Time Travel is a feature for giving users access to the different versions/snapshots of the Delta table. You cannot specify a timestamp attribute as a time travel index of sorts.

Each operation (merge, insert, delete, etc) on the Delta table creates a new version of the table with associated timestamp and metadata of that change. You can then time travel to the different versions of your data.

Here is a link to some helpful documentation.

https://docs.delta.io/latest/delta-batch.html#query-an-older-snapshot-of-a-table-time-travel


For the second question

How can I specify a primary key or a composite primary key when creating a delta table.

Primary keys are not a metadata component within a Delta Table. There is no definition within Delta Lake known as a primary key outside of Merge operations. The only time a unique key is required is during the runtime of a Merge operation.

Some more helpful documentation. Make not of the important note block in this section

https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge