Tables
There are two types of tables, keyed and unkeyed.
// creating tables
t1:([]sym:`$();price:float$();size:long$()) // empty typed table
t2:([]sym:enlist `AAPL;price:enlist 100.0;size:enlist 100) // single items must be enlisted
t3:([]sym:`AAPL`GOOG;price:100.0 89.0;size:100 120) // unkeyed table
t4:([sym:`AAPL`GOOG]price:100.0 89.0;size:100 120) // keyed table
// key and unkey
`sym xkey t3 // key method 1: use xkey; here we key on sym
t2!t3 // key method 2: use !; all columns in tbl1 will be used as key in the joined table
() xkey t4 // unkey method 1: use xkey with empty list
0!t4 // unkey method 2: use ! with 0
// indexing tables
t2[`size] // index a column
t2[0] // index a row
t2[0 1;`size] // unkeyed table, index by row, then by column
t2[`size][0 1] // unkeyed table, alternatively, index by column, then by row
t4[([]sym: `AAPL`GOOG); `size] // keyed table, index by key, then by column
// insert: three ways, list, VALUE of dictionary, table
`t1 insert (`TSLA`IBM; 88.1 89.5; 100 200) // insert a list of rows
`t1 insert value `sym`price`size!((`MSFT;`TSLA); (98.1; 88.1); (200; 100)) // insert the VALUE of a dictionary (not the dictionary itself!!)
`t1 insert ([]sym:`AAPL;price:100.0;size:100) // insert a table
// upsert: same as insert for unkeyed tables; for keyed tables, if the key already exists, it'll update the row, otherwise it errors
// also there are some nuances when it comes to usage
`t4 upsert ((`AAPL; 101.0; 101); (`GOOG; 88.0; 100)) // upsert a list of rows ROW BY ROW
`t4 upsert flip value `sym`price`size!(`MSFT`TSLA; 98.1 88.1; 200 100) // upsert the FLIPPED VALUE of a dictionary
`t4 upsert ([]sym:`AAPL;price:100.0;size:100) // upsert a table
Attributes
- Attributes are used on lists or columns of a table.
- Attributes are not enforced properties on the list; they are merely metadata that describe the list if
it exhibits certain properties. If the list does not maintain the properties, the attribute will be
lost.
For example, if you try to apply sorted attribute on an unsorted list, it won't work. And if a sorted
list is no longer sorted, the attribute will be lost.
- One list can only have one attribute at a time. Applying a new attribute will overwrite the existing
one.
- Attributes can improve performance of queries.
To check what attribute is applied to a list, use
attr.
q) attr asc 1 2 3 4
`s
q) attr 1 2 3
// returns nothing because by default there is no attribute, even it looks sorted to humans
To apply an attribute to a list, there are several ways:
- Some built in function automatically applies some attribute for you, like
asc above.
- Explicitly apply it using the
`x# syntax, where x is the attribute. e.g.,
`s#til 5
- Use
@ for lists, e.g., @[`s#; l]
To remove an attribute, simply do
`# on the list with an empty symbol.
For example,
`# asc 1 2 3 4
Summary
| Attribute |
Purpose |
Requirement |
Space |
Example good |
Example fail |
| sorted |
allow binary search |
list in ascending order |
none |
`s#til 5 |
`s#5 4 3 2 1 |
| unique |
store as dictionary |
list has no duplicates |
large |
`u#1 2 3 |
`u#1 2 3 2 |
| parted |
store as dictionary of arrays |
identical items in the list are stored together |
small |
`p#1 2 2 3 4 |
`p#2 1 2 3 4 |
| grouped |
store as dictionary of dynamic arrays |
none |
very large |
`g#2 1 2 3 4 |
n/a |
Sorted
This is in general for all attributes that have certain requirements: if you modify a list, you must modify in
place. You MUST NOT create a copy of the list. In creating the copy the attribute will be lost.
Using sorted as an example:
q) sortedList: `s#til 5
q) sortedList,6 // create a copy; this will FAIL even if the list looks sorted to humans
q) sortedList,:6 // modify in place
You can think of the syntax as
+= in python.
By default, operations in KDB like
upsert and
insert preserve this attribute.
Unique
This is essentially a dictionary.
It's very fast, but takes up a lot of space. Meanwhile, you can't have duplicate keys.
Therefore only use it on very small tables.
Parted
If we really want to have duplicate keys, what can we do?
This is like a dictionary of arrays with fixed sizes. Each key points to an array of locations that stores the
same value.
Under the hood something like this is happening:
q) `p#`A`A`B`C`C`C
// just an idea of what's going on, not really how KDB implements it
`A (starts at 0, length 2)
`B (starts at 2, length 1)
`C (starts at 3, length 3)
Because underlying data is stored contiguously, it's very fast.
However it comes with a drawback: you need to know the size in advance. ie, you need to make sure the same
values are stored together.
This is why this attribute is almost exclusively used in hdbs.
Grouped
This attribute tries to solve both the drawbacks of unique and parted.
It's like a dictionary of lists. You don't need to know the size in advance, and you can have duplicate keys,
but as a tradeoff, you need extra space to remember the index of each value.
q) group "mississippi"
// this is the real implementation
m| ,0
i| 1 4 7 10
s| 2 3 5 6
p| 8 9
Joins
Simple row joins
Use
,. This is similar to how
insert works.
This type of join won't perform type checks. As long as the number of columns matches, it will perform the join.
q) t1,t2
Simple column joins
Use
,'. This is similar to how
upsert works.
Again this type of join won't perform type checks. As long as the number of rows matches, it will perform the
join.
If the column already exists inth e original table, the table on the right will overwrite that column.
q) t1,'t2
Table joins
Multiple types of joins - we can seperate them into three types.
The first type is left join
lj, inner join
ij and their variants.
- left join: requires table on the right to be keyed.
q) lj[unkeyed or keyed; keyed]
lj will attempt to match the keys of the RHS table. a record must contain ALL same keys to be considered
a match.
once a match is found, it'll do three things
- if a column exists in RHS but not LHS, append this column
- if a column exists in LHS but not RHS, keep this column as is
- if a column exists in both LHS and RHS, update this column using RHS value
-
inner join: requires table on the right to be keyed.
q) ij[unkeyed or keyed; keyed]
-
plus join: variant of left join; updates the value with the sum.
q) pj[unkeyed or keyed; keyed]
if a match is found, it'll do these things
- if a column exists in RHS but not LHS, append this column
- if a column exists in LHS but not RHS, keep this column as is
- if a column exists in both LHS and RHS, update this column using the sum of LHS and RHS
value
-
equal join: variant of inner join.
It performs column matching on multiple columns, not just the keys, so neither table needs to be keyed.
q) ej[colsToMatch; t1; t2]
note that, if colsToMatch are the same as keys in the RHS table in ij, the two
queries will return the same result.
The second type is union join
uj.
Union join is a very general type of join that returns the union of two tables, meaning all records from both
tables.
When you perform union joins, neither table needs to be keyed.
The third type is time-series based joins, as-of join
aj and window join
wj.
Join Examples
q) t1:([] sym:`A`A`B`C; id:1 2 3 4; qty:10 20 30 40; time:09:59 10:00 10:01 10:02)
q) t2:([] sym:`A`B`B`D; id:1 3 5 6; px:100 200 210 300; time:09:58 10:00 10:03 10:01)
// trade table // quote table
sym id qty time sym id px time
----------------- -----------------
A 1 10 09:59 A 1 100 09:58
A 2 20 10:00 B 3 200 10:00
B 3 30 10:01 B 5 210 10:03
C 4 40 10:02 D 6 300 10:01
Left join, inner join, plus join, equal join:
// If we only key on sym
q) kt2:`sym`id xkey t2
// left join // inner join // plus join on id // equal join on sym
q) lj[t1; kt2] q) ij[t1; kt2] q) pj[t1; kt2] q) ej[`sym; t1; t2]
sym id qty time px sym id qty time px sym id qty time px sym id qty time px
----------------------- -------------------- -------------------- ---------------------
A 1 10 09:58 100 A 1 10 09:58 100 A 2 10 19:57 100 A 1 10 09:58 100
A 1 20 09:58 100 A 1 20 09:58 100 A 3 20 19:58 100 A 1 20 09:58 100
B 3 30 10:00 200 B 3 30 10:00 200 B 6 30 20:01 200 B 3 30 10:00 200
C 4 40 10:02 C 4 40 10:02 0
// If we key on both sym and id, ie. both sym and id need to match
q) kt2:`sym`id xkey t2
// left join // inner join // plus join on id // equal join on sym and id
q) lj[t1; kt2] q) ij[t1; kt2] q) pj[t1; kt2] q) ej[`sym`id; t1; t2]
sym id qty time px sym id qty time px sym id qty time px sym id qty time px
----------------------- -------------------- -------------------- ---------------------
A 1 10 09:58 100 A 1 10 9:58 100 A 1 10 19:57 100 A 1 10 09:58 100
A 2 20 10:00 B 3 30 10:00 200 A 2 20 10:00 0 B 3 30 10:00 200
B 3 30 10:00 200 B 3 30 20:01 200
C 4 40 10:02 C 4 40 10:02 0
Union join, as-of join, window join:
q) st2: `sym`time xasc t2
// union join // as-of join // window join
q) uj[t1; t2] q) aj[`sym`time; t1; st2] q) w:(t1.time-2; t1.time+2) // range from 09:59-2 to 10:02+2
sym id qty time px sym id qty time px 09:57 09:58 09:59 10:00
----------------------------- ---------------------- 10:01 10:02 10:03 10:04
A 1 10 09:59 A 1 10 09:59 100
A 2 20 10:00 A 1 20 10:00 100 q) wj[w; `sym`time; st1; (st2; (last;`px))]
B 3 30 10:01 B 3 30 10:01 200 sym id qty time px
C 4 40 10:02 C 4 40 10:02 -----------------------
A 1 09:58 100 A 1 10 09:59 100
B 3 10:00 200 A 2 20 10:00 100
B 5 10:03 210 B 3 30 10:01 210
D 6 10:01 300 C 4 30 10:02