1
votes

I have two different systems pushing data into a kdb table. These are driven by a feed of prices. I'd like to compare the values produced so I can ultimately flag large differences. I'm very new to kdb though and finding it hard to even work out a starting point query.

Ultimately I'd like to take a time period (likely a minute,) find a row for each system within this time period where the driving price is the same and compare the derived values.

A good starting point for me though would be to understand how to grab the first row within a timeperiod for each system and compare/join..

Thanks.

Simplifed example data

Example: -

System             | Time    | driver | result1 | result2
systemA.instrument1| 11:59:59| 101.4  | 3.4     | 4.6
systemA.instrument1| 12:00:01| 101.5  | 3.8     | 4.8
systemA.instrument1| 12:00:02| 101.6  | 3.3     | 2.3
systemA.instrument2| 12:00:02| 106.6  | 11.1    | 11.3
systemA.instrument1| 12:00:05| 101.7  | 3.9     | 5.6
systemB.instrument1| 12:00:09| 101.1  | 3.2     | 7.8
systemB.instrument1| 12:00:14| 101.2  | 3.9     | 3.4
systemB.instrument1| 12:00:17| 101.3  | 3.1     | 8.9
systemB.instrument2| 12:00:19| 106.5  | 11.2    | 11.4
systemB.instrument1| 12:00:58| 101.7  | 3.9     | 9.3
systemB.instrument1| 12:00:59| 101.7  | 3.3     | 3.4
systemB.instrument1| 12:01:03| 101.4  | 3.1     | 5.6

I only want data from 12:00:00 - 12:00:59

The only matching driver between SystemA and SystemB instrument1 is 101.7. I'd like either to be used and the diff between the results shown. For instrument2 the driver never matches so I want to use the driver prices that are closest between the systems.

results      | driver | driver diff | result1diff | result2diff
instrument1  | 101.7  | 0           | 0           | 3.7
instrument2  |        | 0.1         | 0.1         | 0.1
1
Can you provide concrete examples of your current data, its format and the output you would expect? - John at TimeStored

1 Answers

1
votes

First, split your System column into its constituent parts:

table:(flip exec `System`Instrument!flip ` vs/: System from table)
    ,'delete System from table

The answer to your first question (get the first row per Instrument and System), is:

q)table:(flip exec `System`Instrument!flip ` vs/: System from table),'delete System from table
Instrument  System | Time     driver result1 result2
-------------------| -------------------------------
instrument1 systemA| 11:59:59 101.4  3.4     4.6    
instrument1 systemB| 12:00:09 101.1  3.2     7.8    
instrument2 systemA| 12:00:02 106.6  11.1    11.3   
instrument2 systemB| 12:00:19 106.5  11.2    11.4   

Btw, in q, it is a more common use case to request the last row and this is easier to achieve:

q)select by Instrument,System from table

Define function to findest indices of closest values in two numerical vectors:

q)closest:{a:a?min a:abs(-) ./: x cross y;(a div count y;a mod count y)}

Query result1 where drivers are closest:

q)select result1:result1(value group System)@'closest . value driver group System by Instrument from table

Instrument | result1  
-----------| ---------
instrument1| 3.4  3.1 
instrument2| 11.1 11.2