Tables

Tables

There are two types of tables, keyed and unkeyed.

            // creating tables
            t1:([]sym:`$();price:float$();size:long$())                                 // empty typed table
            t2:([]sym:enlist `AAPL;price:enlist 100.0;size:enlist 100)                  // single items must be enlisted
            t3:([]sym:`AAPL`GOOG;price:100.0 89.0;size:100 120)                         // unkeyed table
            t4:([sym:`AAPL`GOOG]price:100.0 89.0;size:100 120)                          // keyed table

            // key and unkey
            `sym xkey t3                                                                // key method 1: use xkey; here we key on sym
            t2!t3                                                                       // key method 2: use !; all columns in tbl1 will be used as key in the joined table
            () xkey t4                                                                  // unkey method 1: use xkey with empty list
            0!t4                                                                        // unkey method 2: use ! with 0

            // indexing tables
            t2[`size]                                                                   // index a column
            t2[0]                                                                       // index a row
            t2[0 1;`size]                                                               // unkeyed table, index by row, then by column
            t2[`size][0 1]                                                              // unkeyed table, alternatively, index by column, then by row
            t4[([]sym: `AAPL`GOOG); `size]                                              // keyed table, index by key, then by column

            // insert: three ways, list, VALUE of dictionary, table
            `t1 insert (`TSLA`IBM; 88.1 89.5; 100 200)                                  // insert a list of rows
            `t1 insert value `sym`price`size!((`MSFT;`TSLA); (98.1; 88.1); (200; 100))  // insert the VALUE of a dictionary (not the dictionary itself!!)
            `t1 insert ([]sym:`AAPL;price:100.0;size:100)                               // insert a table

            // upsert: same as insert for unkeyed tables; for keyed tables, if the key already exists, it'll update the row, otherwise it errors
            // also there are some nuances when it comes to usage
            `t4 upsert ((`AAPL; 101.0; 101); (`GOOG; 88.0; 100))                        // upsert a list of rows ROW BY ROW
            `t4 upsert flip value `sym`price`size!(`MSFT`TSLA; 98.1 88.1; 200 100)      // upsert the FLIPPED VALUE of a dictionary
            `t4 upsert ([]sym:`AAPL;price:100.0;size:100)                               // upsert a table
        

Attributes

To check what attribute is applied to a list, use attr.

        q) attr asc 1 2 3 4
        `s
        q) attr 1 2 3
        // returns nothing because by default there is no attribute, even it looks sorted to humans
        
To apply an attribute to a list, there are several ways: To remove an attribute, simply do `# on the list with an empty symbol. For example, `# asc 1 2 3 4

Summary

Attribute Purpose Requirement Space Example good Example fail
sorted allow binary search list in ascending order none `s#til 5 `s#5 4 3 2 1
unique store as dictionary list has no duplicates large `u#1 2 3 `u#1 2 3 2
parted store as dictionary of arrays identical items in the list are stored together small `p#1 2 2 3 4 `p#2 1 2 3 4
grouped store as dictionary of dynamic arrays none very large `g#2 1 2 3 4 n/a

Sorted

This is in general for all attributes that have certain requirements: if you modify a list, you must modify in place. You MUST NOT create a copy of the list. In creating the copy the attribute will be lost.
Using sorted as an example:

            q) sortedList: `s#til 5
            q) sortedList,6     // create a copy; this will FAIL even if the list looks sorted to humans
            q) sortedList,:6    // modify in place
        
You can think of the syntax as += in python.
By default, operations in KDB like upsert and insert preserve this attribute.

Unique

This is essentially a dictionary.
It's very fast, but takes up a lot of space. Meanwhile, you can't have duplicate keys.
Therefore only use it on very small tables.

Parted

If we really want to have duplicate keys, what can we do?
This is like a dictionary of arrays with fixed sizes. Each key points to an array of locations that stores the same value.
Under the hood something like this is happening:

            q) `p#`A`A`B`C`C`C
            // just an idea of what's going on, not really how KDB implements it
            `A (starts at 0, length 2)
            `B (starts at 2, length 1)
            `C (starts at 3, length 3)
        
Because underlying data is stored contiguously, it's very fast.
However it comes with a drawback: you need to know the size in advance. ie, you need to make sure the same values are stored together.
This is why this attribute is almost exclusively used in hdbs.

Grouped

This attribute tries to solve both the drawbacks of unique and parted.
It's like a dictionary of lists. You don't need to know the size in advance, and you can have duplicate keys, but as a tradeoff, you need extra space to remember the index of each value.

            q) group "mississippi"
            // this is the real implementation
            m| ,0
            i| 1 4 7 10
            s| 2 3 5 6
            p| 8 9
        

Joins

Simple row joins

Use ,. This is similar to how insert works.
This type of join won't perform type checks. As long as the number of columns matches, it will perform the join.
q) t1,t2

Simple column joins

Use ,'. This is similar to how upsert works.
Again this type of join won't perform type checks. As long as the number of rows matches, it will perform the join.
If the column already exists inth e original table, the table on the right will overwrite that column.
q) t1,'t2

Table joins

Multiple types of joins - we can seperate them into three types.
The first type is left join lj, inner join ij and their variants. The second type is union join uj.
Union join is a very general type of join that returns the union of two tables, meaning all records from both tables.
When you perform union joins, neither table needs to be keyed.

The third type is time-series based joins, as-of join aj and window join wj.

Join Examples


            q) t1:([] sym:`A`A`B`C; id:1 2 3 4; qty:10 20 30 40; time:09:59 10:00 10:01 10:02)
            q) t2:([] sym:`A`B`B`D; id:1 3 5 6; px:100 200 210 300; time:09:58 10:00 10:03 10:01)
            
            // trade table                      // quote table
            sym id qty time                     sym id px time
            -----------------                   -----------------
            A   1  10  09:59                    A   1  100 09:58
            A   2  20  10:00                    B   3  200 10:00
            B   3  30  10:01                    B   5  210 10:03
            C   4  40  10:02                    D   6  300 10:01
        
Left join, inner join, plus join, equal join:

            // If we only key on sym
            q) kt2:`sym`id xkey t2 
            // left join                  // inner join                 // plus join on id            // equal join on sym
            q) lj[t1; kt2]                q) ij[t1; kt2]                q) pj[t1; kt2]                q) ej[`sym; t1; t2]
            sym id qty time  px           sym id qty time px            sym id qty time  px           sym id qty time  px  
            -----------------------       --------------------          --------------------          ---------------------
            A   1  10  09:58 100          A   1  10  09:58 100          A   2  10  19:57 100          A   1  10  09:58 100 
            A   1  20  09:58 100          A   1  20  09:58 100          A   3  20  19:58 100          A   1  20  09:58 100
            B   3  30  10:00 200          B   3  30  10:00 200          B   6  30  20:01 200          B   3  30  10:00 200
            C   4  40  10:02                                            C   4  40  10:02 0

            // If we key on both sym and id, ie. both sym and id need to match
            q) kt2:`sym`id xkey t2 
            // left join                  // inner join                 // plus join on id            // equal join on sym and id
            q) lj[t1; kt2]                q) ij[t1; kt2]                q) pj[t1; kt2]                q) ej[`sym`id; t1; t2]
            sym id qty time  px           sym id qty time px            sym id qty time  px           sym id qty time  px  
            -----------------------       --------------------          --------------------          ---------------------
            A   1  10  09:58 100          A   1  10  9:58  100          A   1  10  19:57 100          A   1  10  09:58 100 
            A   2  20  10:00              B   3  30  10:00 200          A   2  20  10:00 0            B   3  30  10:00 200
            B   3  30  10:00 200                                        B   3  30  20:01 200
            C   4  40  10:02                                            C   4  40  10:02 0
        
Union join, as-of join, window join:
 
            q) st2: `sym`time xasc t2
            // union join                       // as-of join                   // window join 
            q) uj[t1; t2]                       q) aj[`sym`time; t1; st2]       q) w:(t1.time-2; t1.time+2) // range from 09:59-2 to 10:02+2
            sym id qty time  px                 sym id qty time  px             09:57 09:58 09:59 10:00
            -----------------------------       ----------------------          10:01 10:02 10:03 10:04
            A   1  10  09:59                    A   1  10  09:59 100            
            A   2  20  10:00                    A   1  20  10:00 100            q) wj[w; `sym`time; st1; (st2; (last;`px))]
            B   3  30  10:01                    B   3  30  10:01 200            sym id qty  time  px
            C   4  40  10:02                    C   4  40  10:02                -----------------------
            A   1      09:58 100                                                A   1  10  09:59  100
            B   3      10:00 200                                                A   2  20  10:00  100
            B   5      10:03 210                                                B   3  30  10:01  210
            D   6      10:01 300                                                C   4  30  10:02