Adding A New Aggregator Function To AFC
An aggregator is a (typically statistical) function that computes a single value from a range of values. SUM()
is the prime example. Here, we shall look at how I implemented VAR()
. So let’s assume VAR()
has not been implemented yet.
Note: You must have read the topics on adding primitive functions and high-level functions to understand this section.
Docs and Tests
As always, we start with the announcement in the release notes, and the tests:
A | B | C | D | E | F | G | H | I | J | K | L | |
1 | Expected =IF(Q1,"Expected","FAILED!") |
Actual | Inputs | # of Inputs | Name | Highlight | ||||||
149 | 0.5 | 0.5 =VAR(C149,D149) |
1 | 2 | 2 | VAR (does not support blanks!) | VAR | |||||
150 | 0.5 | 0.5 =VAR(C150,D150) |
4 | 5 | 2 | |||||||
151 | ||||||||||||
152 | 5.8 | 5.8 =VAR(C152:E152,G152,I152) |
1 | 2 | 3 | 100 | 5 | 200 | 7 | 7 | ||
153 | 2.5 | 2.5 =VAR(C153:E153,G153,I153) |
6 | 7 | 8 | 80 | 9 | 10 | 7 |
Parsing
Making VAR()
known to the parser is much like what we did for ABS()
. Aggregators, however, need to accept range arguments, not just simple values. For the standard aggregators which treat all their arguments as a single range, we can mimick the definition of SUM()
as follows:
| "VAR" aggN( Function.VAR )
Mathematical Definition
The function VAR()
is defined in the Excel help file as follows:
Clearly, what we have here is a sum over a function on all of the elements in the range. How can we rewrite this using Excel functions? We cannot. But AFC defines a couple of functions of its own that help.
Folding Functions
For a quick example, let’s look at how the rewrite rule for SUM()
is defined in AFC:
def sum = fold/reduce with s = 0 each xi as s = s + xi end
rewrite sum( xs* ) = apply sum to list xs
This means that the result, s
, is initially 0. Then, for every xi
in the range xs
, we simply add xi
to s
. In the argument specification, the *
indicates that xs
represents all remaining arguments as a list (much like int... a
in Java).
So the folding function iterates over a list of arguments. For every value in the list, it executes a given computation involving a so-called accumulator value (called r
here) and the current value from the list (called xi
here). The result becomes the new accumulator value. The final result is the accumulator value at the end of the list.
Special First Argument
The version of fold
I used here, fold/reduce
, is simply fold
with a hint that it is OK to directly use the first argument as the starting value, instead of the specified value 0. This is a performance optimization.
Rewriting It
Let’s recap the mathematical definition:
where the average is, of course, defined as:
We could formulate this as a rewrite rule directly using a fold
as in SUM
above, with the help of COUNT
and AVERAGE
. However, passing over the list of values only once is generally more efficient (especially once AFC supports cursor-style iteration over large repeating sections). A bit of algebra shows that the above is the same as:
So we need both the sum and the sum of squares of the list of values. We do this using two parallel accumulators:
def var =
fold with
s = 0,
ss = 0
each xi as
s = s + xi,
ss = ss + xi * xi
with count n into
(ss - s*s/n) / (n - 1)
end
rewrite var( xs* ) = apply var to list xs