AFC - Abacus Formula Compiler for Java

Adding A New Aggregator Function To AFC

An aggregator is a (typically statistical) function that computes a single value from a range of values. SUM() is the prime example. Here, we shall look at how I implemented VAR(). So let’s assume VAR() has not been implemented yet.

Note: You must have read the topics on adding primitive functions and high-level functions to understand this section.

Docs and Tests

As always, we start with the announcement in the release notes, and the tests:

A B C D E F G H I J K L
1 Expected
=IF(Q1,"Expected","FAILED!")
Actual Inputs # of Inputs Name Highlight
149 0.5 0.5
=VAR(C149,D149)
1 2 2 VAR (does not support blanks!) VAR
150 0.5 0.5
=VAR(C150,D150)
4 5 2
151
152 5.8 5.8
=VAR(C152:E152,G152,I152)
1 2 3 100 5 200 7 7
153 2.5 2.5
=VAR(C153:E153,G153,I153)
6 7 8 80 9 10 7

Parsing

Making VAR() known to the parser is much like what we did for ABS(). Aggregators, however, need to accept range arguments, not just simple values. For the standard aggregators which treat all their arguments as a single range, we can mimick the definition of SUM() as follows:

|	"VAR" aggN( Function.VAR )

Mathematical Definition

The function VAR() is defined in the Excel help file as follows:

( x-x ¯) 2 n-1

Clearly, what we have here is a sum over a function on all of the elements in the range. How can we rewrite this using Excel functions? We cannot. But AFC defines a couple of functions of its own that help.

Folding Functions

For a quick example, let’s look at how the rewrite rule for SUM() is defined in AFC:

def sum = fold/reduce with s = 0 each xi as s = s + xi end
rewrite sum( xs* ) = apply sum to list xs

This means that the result, s, is initially 0. Then, for every xi in the range xs, we simply add xi to s. In the argument specification, the * indicates that xs represents all remaining arguments as a list (much like int... a in Java).

So the folding function iterates over a list of arguments. For every value in the list, it executes a given computation involving a so-called accumulator value (called r here) and the current value from the list (called xi here). The result becomes the new accumulator value. The final result is the accumulator value at the end of the list.

Special First Argument

The version of fold I used here, fold/reduce, is simply fold with a hint that it is OK to directly use the first argument as the starting value, instead of the specified value 0. This is a performance optimization.

Rewriting It

Let’s recap the mathematical definition:

(x i-x ¯) 2n -1

where the average is, of course, defined as:

x¯ = x in

We could formulate this as a rewrite rule directly using a fold as in SUM above, with the help of COUNT and AVERAGE. However, passing over the list of values only once is generally more efficient (especially once AFC supports cursor-style iteration over large repeating sections). A bit of algebra shows that the above is the same as:

x 2 - ( x ) 2 / n n - 1

So we need both the sum and the sum of squares of the list of values. We do this using two parallel accumulators:

def var =
	fold with
		s = 0,
		ss = 0
	each xi as
		s = s + xi,
		ss = ss + xi * xi
	with count n into
		(ss - s*s/n) / (n - 1)
	end
rewrite var( xs* ) = apply var to list xs