Adding A New Primitive Spreadsheet Function To AFC
As a simple, but probably typical example, we will look at how I added ABS()
to the list of supported spreadsheet functions. So for the rest of this page, we will assume ABS()
has not been implemented yet.
Rule 1:
Document first
Documentation, well, documents, but it also focuses our thinking. This is why we start with it – and, of course, so we don’t forget it. For larger, novel features, it can be quite elaborate. In our case, we may safely assume that people know why we would want ABS()
to be supported, and that they will consult the Excel help file for information on it.
Release Notes
Nevertheless, we still need to document that ABS()
is, in fact, supported. First, we announce the improvement in the release notes in src/doc/releasenotes.rextile
. Just look at the file and you’ll see how (as per the house rule mimick existing code).
Rextile
AFC’s documentation is written in Textile, a very concise and text editor-friendly format for writing XHTML. To convert the Textile sources to XHTML, we use Rextile—a Ruby based tool. You will need to install it. Once it’s installed, you can simply run ant doc to regenerate the documentation in doc/
from the sources in src/doc/
. (It is quicker to run ant -q doc-main-update
if you just want see the updated release notes, and can live with some warnings and missing citations.)
Rule 2:
Cite tests
So far, we simply said: ABS()
is supported. How do we know it is? How does the reader know? This is where test-driven development gets into the picture – with a twist. We prove what we say directly to our reader. The proof becomes an integral part of the documentation (which is, after all, what we are saying). So, to integrate the proof, we always give examples which are cited from automated tests.
Automated Formula Tests
AFC’s existing testing and citing infrastructure makes this easy, especially for adding a new spreadsheet function. The spreadsheet files in
components/system/src/test-reference/java/org/formulacompiler/tests/reference
contain formula tests. So, to prove we implemented ABS()
, all we still need to do is add test cases to one of the test sheets.
For ABS()
, we extend the sheet called NumericFunctions.xls
. By citing it, I can show it right here (only the relevant subset is shown):
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | |
1 | Expected =IF(Q1,"Expected","FAILED!") |
Actual | Inputs | # of Inputs | Name | Highlight | Excel says | Skip for | Custom check | true =AND(Q2:Q10000) |
|||||||
2 | 1 | 1 =ABS(C2) |
-1 | 1 | ABS | ABS | |||||||||||
3 | 0 | 0 =ABS(C3) |
0 | 1 | |||||||||||||
4 | 1 | 1 =ABS(C4) |
1 | 1 | |||||||||||||
5 | 2 | 2 =ABS(C5) |
-2 | 1 | |||||||||||||
6 | 2 | 2 =ABS(C6) |
2 | 1 | |||||||||||||
7 | 0 | 0 =ABS(C7) |
false | 1 | |||||||||||||
8 | 1 | 1 =ABS(C8) |
true | 1 | |||||||||||||
9 | 3 | 3 =ABS(C9) |
-3 | 1 | |||||||||||||
10 | 4 | 4 =ABS(C10) |
-4 | 1 |
And here is the documentation produced from it. Note also how the list of supported functions in the reference index automatically includes ABS
.
See here for details on exactly what the columns of this sheet mean.
Rule 3:
Mimick existing code
This rule is the monkey see, monkey do rule from the book on contributing to Eclipse. It means we look for an already implemented function closely matching ABS()
, and then mimick its implementation and code style (aka copy/paste). We will mimick ROUND()
for ABS()
. In fact, we already did. When documenting ABS()
in the list of supported functions, remember? This rule basically states something we would do anyway.
So, to add the test cases for ABS()
, I already copied lines from ROUND()
and adjusted them. You can see them in the sheet above. (Copying the lines made sure that the conditional formatting for input values was copied too.)
As explained in the topic on the reference test sheets, you also have to update the corresponding .ods and .yaml files. If you forget, running the tests will tell you.
Rule 4:
Implement only what the tests require
This is test-driven development. We implement ABS()
by just adding missing code until the tests run. The first step is, then, to run the tests just added (which will fail, of course). Every formula test sheet corresponds to a test class in
components/system/src/test-reference/java/org/formulacompiler/tests/reference
For the sheet NumericFunctions.xls
, we need to run the test class NumericFunctionsTest
. Doing this now (ant -q test-ref
) returns something like:
org.formulacompiler.spreadsheet.SpreadsheetException$UnsupportedExpression:
Unsupported function ABS encountered in expression ABS( <<? C2); error location indicated by <<?.
Cell containing expression is B2.
at org.formulacompiler.spreadsheet.internal.CellWithLazilyParsedExpression.getExpression(CellWithLazilyParsedExpression.java:71)
...
Running these tests can take quite a while. See here for how to speed this up.
Extending The Parser
Unsupported function encountered means AFC’s Excel formula parser cannot parse something. So we must make it support ABS()
as our first implementation step. The parser is generated from a special parser description language using JavaCC. Its input file is
components/compiler/src/build/javacc/ExpressionParser.jj
ROUND
looks like this there:
| "ROUND" fun2( Function.ROUND )
So we simply add the following line (we keep the list sorted alphabetically and ABS
comes first, so it starts with {
instead of |
):
{ "ABS" fun1( Function.ABS )
Look through the list there to see the different standard options for defining the argument syntax. In particular, fun1()
is a unary function, fun2()
a binary function, etc. For details on how JavaCC works, please refer to its grammar documentation.
We run ant -q build
to regenerate the parser. Now, the Java symbol org.formulacompiler.compiler.Function.ABS
is undefined. Extending this enumeration is straightforward. We do it, and the code compiles again.
Implementing The Function
Running the test again, we get:
org.formulacompiler.compiler.CompilerException$UnsupportedExpression:
Function ABS is not supported for double engines.
In expression >> ABS( C2 ) << ; error location indicated by >>..<<.
Cell containing expression is B2.
Referenced by cell B2.
at ...
This is from the heart of AFC, the byte code compiler. To implement ABS()
, let’s look at how ROUND()
is implemented. This is where I have to introduce you to one of the niftier parts of AFC: the JVM byte-code decompiler.
Template Methods
AFC is, like javac
, a byte-code compiler for the JVM. But, luckily, you don’t have to know the JVM byte code instruction set to add primitive functions like ABS()
to AFC. Instead, you write simple template methods in plain Java, like this:
public double fun_ROUND( double a, double b ) { return RuntimeDouble_v2.round( a, (int) b ); }
Any method you put into the class
components/compiler/src/impl/java/org/formulacompiler/compiler/internal/templates/ExpressionTemplatesForDoubles.java
that is called fun_XY
is automatically decompiled by AFC’s build process to generate a byte-code compiler for expression nodes of the kind new ExpressionNodeForFunction( Function.XY )
.
Expression Rewriting
There is another approach for functions that can be rewritten to expressions involving only other, more primitive expression functions. This is shown in the topic on adding high-level functions.
Runtime Support
As you can see, ROUND()
is not implemented directly, but mapped to a runtime support function. The runtime is subclassed for each supported numeric type. It is also versioned so AFC can easily provide backwards compatibility with older compiled engines. Here is the implementation from org.formulacompiler.runtime.internal.RuntimeDouble_v2
:
public static double round( final double _val, final int _maxFrac ) { final double shift = pow10( _maxFrac ); if (0 > _val) { return Math.ceil( _val * shift - 0.5 ) / shift; } else { return Math.floor( _val * shift + 0.5 ) / shift; } }
Numeric Types
This would be very easy to do for ABS()
too. But ABS()
is really so basic, we will implement it directly instead:
public double fun_ABS( double a ) { return Math.abs( a ); }
This covers double
. We also need to handle BigDecimal
and scaled long
. This is done in the template class ...templates.ExpressionTemplatesForScaledLongs
:
public long fun_ABS( long a ) { return (a < 0) ? -a : a; }
and in ...templates.AbstractExpressionTemplatesForBigDecimals
:
@ReturnsAdjustedValue public BigDecimal fun_ABS( BigDecimal a ) { return a.abs(); }
The latter is OK if you don’t need to take the MathContext
or scaling into account. If you do, as does the +
operator, then you have to use ...templates.ExpressionTemplatesForPrecisionBigDecimals
:
@ReturnsAdjustedValue public BigDecimal op_PLUS( BigDecimal a, BigDecimal b ) { return a.add( b, this.mathContext ); }
and ...templates.ExpressionTemplatesForScaledBigDecimals
:
@ReturnsAdjustedValue public BigDecimal op_PLUS( BigDecimal a, BigDecimal b ) { return a.add( b ); }
The annotation ReturnsAdjustedValue
means that the method result already has the proper scale. If you omit it, AFC will generate rescaling code after your method body.
Once all this is done, run ant -q build
to rebuild the compiler code from the template methods.
- Note
- With Eclipse, it is crucial to use
ant -q build
to update the compiler code rather than runningorg.formulacompiler.compiler.internal.build.bytecode.PatternCompiler
directly from within the IDE. This is because Eclipse’s compiler produces different byte-code sequences for the pattern methods thanjavac
as invoked by AFC’s build script. While Eclipse’s version is also correct, it breaks the binary comparisons of generated engines with reference versions in some of the unit tests for AFC (seeorg.formulacompiler.tests.utils.AbstractTestBase.checkEngine()
.
Strings
To implement functions working on strings, we need to differentiate two cases.
Arguments
String arguments are straightforward:
public double fun_VALUE( String _text ) { return RuntimeDouble_v2.fun_VALUE( _text, this.environment, this.computationMode ); }
Returns
String return values are different. Firstly, you need to implement them in org.formulacompiler.compiler.internal.templates.ExpressionTemplatesForStrings
(rather than the numeric templates). Note how you can access numeric arguments as int
. AFC will handle the conversion for you automatically:
public String fun_MID( String s, int start, int len ) { return Runtime_v2.fun_MID( s, start, len ); }
Secondly, you have to teach AFC’s type annotator that the function is String-valued. You do this in org.formulacompiler.compiler.internal.model.analysis.TypeAnnotator
:
private DataType typeOf( ExpressionNodeForFunction _expr ) throws CompilerException { annotateArgs( _expr ); switch (_expr.getFunction()) { // ... case CONCATENATE: case MID: // ... return DataType.STRING; case ERROR: return null; // must be provided by surrounding context, like a CASE statement default: return DataType.NUMERIC; } }
Conclusion
That’s it. All the tests run green. Running ant -q doc
updates the documentation and—lo and behold—ABS()
is now included and fully documented.
Now read on to see that other approach using expression rewriting: adding high-level functions.