0
votes

I want to update certain columns in a table using Cassandra COPY feature. But Copy inserts new record even when row is not found. I want to restrict in COPY command, only when PRIMARY KEY row is found the column in csv file to be updated. Sample table and COPY command are shared below.

CREATE TABLE Orders(
Ord_Id      Text Primary Key,
Ord_Date    Int,
Ord_Acct    Text,
Ord_Comp_Dt Int,
Ord_Status  Text)

Sample Data:
Ord_Id | Ord_Date | Ord_Acct | Ord_Comp_Dt | Ord_Status
ORD001 | 20170602 | A001     | 20170615    | InProgress
ORD002 | 20170603 | A002     | 20170607    | Failed
ORD003 | 20170604 | A003     | 20170616    | InProgress
ORD004 | 20170605 | A003     | 20170617    | InProgress

Above table gets row entry when order is placed with Initial Ord_Status='InProgress'. Based on order completion network provides the data with Ord_Id, Ord_Status.

Network Data
 ORD_ID,ORD_STATUS
 ORD001,Failed
 ORD003,Success
 ORD004,Rejected
 ORD005,DataIncomplete

Copy command is provided below

COPY ord_schema.Orders(Ord_Id,Ord_Status) FROM 'NW170610.csv'

Table SnapShot after executing COPY command

Sample Data:
Ord_Id | Ord_Date | Ord_Acct | Ord_Comp_Dt | Ord_Status
ORD001 | 20170602 | A001     | 20170615    | Failed
ORD002 | 20170603 | A002     | 20170607    | Failed
ORD003 | 20170604 | A003     | 20170616    | InProgress
ORD004 | 20170605 | A003     | 20170617    | Rejected
ORD005 | Null     | Null     | Null        | DataIncomplete

ORD005 should not be inserted when Primary Key is not found. Kindly assist is there any way to check data exists before insert or prevent entry when data does not exist.

2

2 Answers

1
votes

Cassandra does an UPSERT. Which means, it will insert a column if there is none (based on primary key).

What I'd suggest is add another column maybe Ord_Acct (something that can bring uniqueness to the data) as a clustering/composite key. Now, if the Ord_Acct is null it won't do an insert. So, to summarize I'd suggest change the data model that meets your requirements.

0
votes

It is not possible to check existence of row before inserting with copy command

Copy command only parse the csv and insert directly. So you have to write your own code to read the csv NW170610 and for each record check existance with select query, If exist then insert.

Or Dump csv of Orders table using copy to command

COPY orders (ord_id) TO 'orders_id.csv';

Now for each record of NW170610 check that the id present in the orders_id.csv, if yes then write the record to another file complete_order.csv.

Now just load the complete_order.csv file using copy from command