KX class

Attributes

Attributes are used on lists or columns of a table.
Attributes are not enforced properties on the list; they are merely metadata that describe the list if it exhibits certain properties. If the list does not maintain the properties, the attribute will be lost.
For example, if you try to apply sorted attribute on an unsorted list, it won't work. And if a sorted list is no longer sorted, the attribute will be lost.
One list can only have one attribute at a time. Applying a new attribute will overwrite the existing one.
Attributes can improve performance of queries.

To check what attribute is applied to a list, use attr.


        q) attr asc 1 2 3 4
        `s
        q) attr 1 2 3
        // returns nothing because by default there is no attribute, even it looks sorted to humans

To apply an attribute to a list, there are several ways:

Some built in function automatically applies some attribute for you, like asc above.
Explicitly apply it using the `x# syntax, where x is the attribute. e.g., `s#til 5
Use @ for lists, e.g., @[`s#; l]

To remove an attribute, simply do `# on the list with an empty symbol. For example, `# asc 1 2 3 4

Summary

Attribute	Purpose	Requirement	Space	Example good	Example fail
sorted	allow binary search	list in ascending order	none	`s#til 5	`s#5 4 3 2 1
unique	store as dictionary	list has no duplicates	large	`u#1 2 3	`u#1 2 3 2
parted	store as dictionary of arrays	identical items in the list are stored together	small	`p#1 2 2 3 4	`p#2 1 2 3 4
grouped	store as dictionary of dynamic arrays	none	very large	`g#2 1 2 3 4	n/a

Sorted

This is in general for all attributes that have certain requirements: if you modify a list, you must modify in place. You MUST NOT create a copy of the list. In creating the copy the attribute will be lost.
Using sorted as an example:


            q) sortedList: `s#til 5
            q) sortedList,6     // create a copy; this will FAIL even if the list looks sorted to humans
            q) sortedList,:6    // modify in place

You can think of the syntax as += in python.
By default, operations in KDB like upsert and insert preserve this attribute.

Unique

This is essentially a dictionary.
It's very fast, but takes up a lot of space. Meanwhile, you can't have duplicate keys.
Therefore only use it on very small tables.

Parted

If we really want to have duplicate keys, what can we do?
This is like a dictionary of arrays with fixed sizes. Each key points to an array of locations that stores the same value.
Under the hood something like this is happening:


            q) `p#`A`A`B`C`C`C
            // just an idea of what's going on, not really how KDB implements it
            `A (starts at 0, length 2)
            `B (starts at 2, length 1)
            `C (starts at 3, length 3)

Because underlying data is stored contiguously, it's very fast.
However it comes with a drawback: you need to know the size in advance. ie, you need to make sure the same values are stored together.
This is why this attribute is almost exclusively used in hdbs.

Grouped

This attribute tries to solve both the drawbacks of unique and parted.
It's like a dictionary of lists. You don't need to know the size in advance, and you can have duplicate keys, but as a tradeoff, you need extra space to remember the index of each value.


            q) group "mississippi"
            // this is the real implementation
            m| ,0
            i| 1 4 7 10
            s| 2 3 5 6
            p| 8 9

Joins

Simple row joins

Use ,. This is similar to how insert works.
This type of join won't perform type checks. As long as the number of columns matches, it will perform the join.

q) t1,t2

Simple column joins

Use ,'. This is similar to how upsert works.
Again this type of join won't perform type checks. As long as the number of rows matches, it will perform the join.
If the column already exists inth e original table, the table on the right will overwrite that column.

q) t1,'t2

Table joins

Multiple types of joins - we can seperate them into three types.
The first type is left join lj, inner join ij and their variants.

left join: requires table on the right to be keyed.
```
q) lj[unkeyed or keyed; keyed]
```
lj will attempt to match the keys of the RHS table. a record must contain ALL same keys to be considered a match.
once a match is found, it'll do three things
- if a column exists in RHS but not LHS, append this column
- if a column exists in LHS but not RHS, keep this column as is
- if a column exists in both LHS and RHS, update this column using RHS value
inner join: requires table on the right to be keyed.
```
q) ij[unkeyed or keyed; keyed]
```
plus join: variant of left join; updates the value with the sum.
```
q) pj[unkeyed or keyed; keyed]
```
if a match is found, it'll do these things
- if a column exists in RHS but not LHS, append this column
- if a column exists in LHS but not RHS, keep this column as is
- if a column exists in both LHS and RHS, update this column using the sum of LHS and RHS value
equal join: variant of inner join.
It performs column matching on multiple columns, not just the keys, so neither table needs to be keyed.
```
q) ej[colsToMatch; t1; t2]
```
note that, if colsToMatch are the same as keys in the RHS table in ij, the two queries will return the same result.

The second type is union join uj.
Union join is a very general type of join that returns the union of two tables, meaning all records from both tables.
When you perform union joins, neither table needs to be keyed.

The third type is time-series based joins, as-of join aj and window join wj.

as-of joins: use aj when you want to join two tables based on some column and time, and you want to join based on the last time record.
syntax:
```
aj[c;t1;t2]
```
where
- t1 is a table
- t2 is a table, usually a simple table, ideally sorted
- c is a symbol list of column names to join on, and the last column is a sortable type, typically time
example:
```
aj[`sym`time; trades; quotes]
```
This joins the trades table with the quotes table on the sym column, and then for the time column, here is what it does: for each time value in trades, it finds the last time value in quotes that is less than or equal to the time value in trades.
For example, if the trades table has a time value of 10:00:00, and the quotes table has a time value of 9:59:00, 10:00:00, and 10:01:00, then the aj function will return the quotes table with the time value of 9:59:00.
If you get a `rank error when using aj, likely in t2 you have too few columns. It can't only contain the columns in c that you want to perform the join on.
window join: generic version of as-of join.
```
wj[w; c; t1; (t2; (f0;c0); (f1;c1))]
```
where
- w is a two-item list specifying how much time before and after to look at
  The time window can be generated two ways
```
                            // w1: 2 min before and 2 min after
                            w1: (st1.time-2; st1.time+2)
                            // w2: 10 min before and 20 min after
                            w2: -10 20+\: st1[`time]
                        
```
  Depending on what data type is your time column, you may need to add or subtract different things.
  In the example above the data type is minute, so only integers are allowed. If it's timestamp, then you'll need to add or subtract timespans like (st1.time-00:02:00 st1.time+00:02:00)
- c is a symbol list of column names to join on, and the last column is a sortable type, typically c is `sym`time
- t1 is a table, ideally sorted on columns in c
- t2 is a table, MUST BE sorted on columns in c
- (f0;c0) and (f1;c1) are aggregations applied on columns; examples are
  - (avg; `price)
  - (min; `bid)
  - (max; `ask)
  - (sum; `size)
  - (first; `px)
  - (last; `px)
  - (::; `px) this is called identify function, it returns the original value

Join Examples


            q) t1:([] sym:`A`A`B`C; id:1 2 3 4; qty:10 20 30 40; time:09:59 10:00 10:01 10:02)
            q) t2:([] sym:`A`B`B`D; id:1 3 5 6; px:100 200 210 300; time:09:58 10:00 10:03 10:01)
            
            // trade table                      // quote table
            sym id qty time                     sym id px time
            -----------------                   -----------------
            A   1  10  09:59                    A   1  100 09:58
            A   2  20  10:00                    B   3  200 10:00
            B   3  30  10:01                    B   5  210 10:03
            C   4  40  10:02                    D   6  300 10:01

Left join, inner join, plus join, equal join:


            // If we only key on sym
            q) kt2:`sym`id xkey t2 
            // left join                  // inner join                 // plus join on id            // equal join on sym
            q) lj[t1; kt2]                q) ij[t1; kt2]                q) pj[t1; kt2]                q) ej[`sym; t1; t2]
            sym id qty time  px           sym id qty time px            sym id qty time  px           sym id qty time  px  
            -----------------------       --------------------          --------------------          ---------------------
            A   1  10  09:58 100          A   1  10  09:58 100          A   2  10  19:57 100          A   1  10  09:58 100 
            A   1  20  09:58 100          A   1  20  09:58 100          A   3  20  19:58 100          A   1  20  09:58 100
            B   3  30  10:00 200          B   3  30  10:00 200          B   6  30  20:01 200          B   3  30  10:00 200
            C   4  40  10:02                                            C   4  40  10:02 0

            // If we key on both sym and id, ie. both sym and id need to match
            q) kt2:`sym`id xkey t2 
            // left join                  // inner join                 // plus join on id            // equal join on sym and id
            q) lj[t1; kt2]                q) ij[t1; kt2]                q) pj[t1; kt2]                q) ej[`sym`id; t1; t2]
            sym id qty time  px           sym id qty time px            sym id qty time  px           sym id qty time  px  
            -----------------------       --------------------          --------------------          ---------------------
            A   1  10  09:58 100          A   1  10  9:58  100          A   1  10  19:57 100          A   1  10  09:58 100 
            A   2  20  10:00              B   3  30  10:00 200          A   2  20  10:00 0            B   3  30  10:00 200
            B   3  30  10:00 200                                        B   3  30  20:01 200
            C   4  40  10:02                                            C   4  40  10:02 0

Union join, as-of join, window join:

 
            q) st2: `sym`time xasc t2
            // union join                       // as-of join                   // window join 
            q) uj[t1; t2]                       q) aj[`sym`time; t1; st2]       q) w:(t1.time-2; t1.time+2) // range from 09:59-2 to 10:02+2
            sym id qty time  px                 sym id qty time  px             09:57 09:58 09:59 10:00
            -----------------------------       ----------------------          10:01 10:02 10:03 10:04
            A   1  10  09:59                    A   1  10  09:59 100            
            A   2  20  10:00                    A   1  20  10:00 100            q) wj[w; `sym`time; st1; (st2; (last;`px))]
            B   3  30  10:01                    B   3  30  10:01 200            sym id qty  time  px
            C   4  40  10:02                    C   4  40  10:02                -----------------------
            A   1      09:58 100                                                A   1  10  09:59  100
            B   3      10:00 200                                                A   2  20  10:00  100
            B   5      10:03 210                                                B   3  30  10:01  210
            D   6      10:01 300                                                C   4  30  10:02

Tables