Q SQL

Q-SQL

How to read table meta

what each column does:

            c        | t f a
            -----------------
            date     | d
            month    | m
            vendor   | s
            time     | p   s 
        

fby

Use fby when you find yourself trying to apply a filter to the aggregated column of a table produced by select...by...
The basic syntax is:
(aggregation;data) fby group
where aggregation is the aggregation function you want to apply, data is the column you want to filter, and group is the column you want to group by.

Example

Say you have this intermediate table, and you want to select from the original table jan09 based on type of vendor, but only records with duration greater than avgDuration, which is simply the average of duration.

        q) res: select avgDuration: avg duration by vendor from jan09
        q) res
        vendor | avgDuration
        ----------------------
        CMT    |  6.480017e+11
        DDS    |  7.600017e+11
        VTS    |  7.290017e+11
        
Without fby, you need to do it like this:
First generate this intermediate table, then join jan09 with this intermediate table.
select from jan09 lj res where duration > avgDuration
But with fby, you can do it like this:
select from jan09 where duration > (avg,duration) fby vendor

More on how to use fby

Keep in mind that fby is used as a filter (a where clause) on the aggregated column of a table, so the three signs are 1. a filter, 2. an aggregated value, and 3. a value to group on. Here are more examples:
  1. Which vendor has the largest number of trips when only considering trips shorter than the average duration of each vendor?
    Where to use fby: "when only considering trips shorter than the average duration of each vendor?"
    Constructed fby: (avg; duration) fby vendor
  2. Which payment type produces the highest average tip when only trups with a fare larger than the average for each vendor is considered?
    Where to use fby: "when only trups with a fare larger than the average for each vendor is considered?"
    Constructed fby: (avg; fare) fby vendor

xbar

This, together with by, is used to group time/numeric values in to buckets.
For example, for each severn day, find the min temperature and the max temperature.
select min mintemp, max maxtemp by 7 xbar date from weather

Functional form

Functional q-sql

For functional queries, the general form is
queryFunc[table; whereClause; byClause; returnClause]
In the following sections, we will learn how to use parse to write functional forms.

queryFunc

queryFunc is always ? for select, exec, and ! for update, delete.
Let's start with a simplest query and see what the parse output looks like.

            q) parse"select from trade"
            ?                           // queryFunc
            `trade                      // table
            ()                          // whereClause
            0b                          // byClause
            ()                          // returnClause
        
When you parse, the five elements of a functional query will be listed one by one. However, sometimes you need to modify the output a little. We will show some common modifications in sections below.

whereClause

whereClause is a list of functional expressions. If it's a single statement, you need to enlist.
Example of writing the where clause from parse:

            q) parse"select from trade where sym=`JPM, size>20"
            ?
            `trade
            ,((=;`sym;,`JPM);(>;`size;20))      // whereClause to be translated
            0b
            ()
        
This where output cannot be used directly. You need to translate it, two principles (applicable to all following sections on other components of the functional query): Thus the whereClause for this statement is actually ((=;`sym; enlist `JPM);(>;`size;20)).

For exec only, you need to use whereClause together with returnClause, otherwise it will only return the last record that satisfy the where clause.
If the above example is changed to exec, you need a returnClause that is a dictionary x!x:cols[trade].

byClause

byClause is a dictionary. Usually you don't need to translate the output from parse.
If you select by sym, exchange..., the byClause in it will just be `sym`exchange!`sym`exchange.
But if there is xbar in the by, then you need to translate it.

            q)parse"select by twoHour:0D02:00:00 xbar time from trade"
            ?
            `trade
            ()
            (,`twoHour)!,(k){$[9h=t:abs[@x];x*r+y=x*1+r:(y:9h$y)div x;x*y div x:$[16h=t;"j"$x;x]]};0D02:00:00.000000000;`time)
            ()
        
The way to translate is to put it into a dictionary like this enlist[`twoHour]!enlist (xbar; 0D02:00:00; `time)

In addtion, with functional by, you can use group by on exec. It will just behave as if it were select. You can pass a list instead of dictionary as byClause if it's exec too.

returnClause

returnClause can be a list or a dictionary, and depending on what queryFunc you're using, it means different things.

optional arguments

For select statement, you can pass two addtional arguments.
?[table; whereClause; byClause; returnClause; windowSize; indexSort]
windowSize specifies the rows to be returned.
For example, 2 means the first two rows, -2 means the last two rows, a list 2 2 means start at row index=2 and return 2 rows (so the third and fourth row).

indexSort can only be used when windowSize is specified.
If you want to return all rows, you can set windowSize:0W but it has to be there.
This argument specifies how you'd sort the output, e.g., (idesc; `price) means sort in descending order then return indices.

Performance hacks

where

When writing a where clause, it is always more efficient to put a date filter first, and then filter on the other columns. This is due to the fact that KDB partitions data by date.
In general, the first where clause should be the most restrictive, returning the smallest set of records.