Caching Of Values and Reusing Computations
Many computations can be sped up by turning on internal caching of cell values. And some functions (NOW
and MATCH
, for example), or repeating sections, always use internal caches. Caches, however, introduce state. You need to reset this state when reusing computations on modified inputs.
Why Cache?
Consider the following very simple example spreadsheet doing the computation of a square’s area and a cube’s volume, given the side length. The cell B1 is the sole input, B3 and B4 are outputs.
A | B | |
1 | Side | 2 |
2 | ||
3 | Area | 4 =B1*B1 |
4 | Volume | 8 =B3*B1 |
Turning on internal caching would give us the following benefits.
- Intermediate Values
- The area is reused by the volume computation. Even in this simple example, if you compute millions of areas and volumes using
BigDecimal
and side lengths of great precision, the cache pays off. - Input Values
- The input side length is used three times in all. Assuming it is expensive to obtain (maybe it involves a database access), the cache will also pay off. You could, of course, cache this value yourself. But since AFC supports automatic internal caching anyway, you can save yourself the trouble.
- Output Values
- Finally, AFC also caches output values. So if you access them multiple times, you benefit without having to cache them yourself.
Note
Caching of input and output values also encourages a functional view of computations, i.e. input value accessors that have no side effects. Since you cannot know beforehand how many times a spreadsheet will access your input values, having side effects in them is very likely a bad idea.
No Caching Is Standard
Normally, computations generated by AFC do not cache values internally, except for some special functions. The reason is that the overhead introduced by caching is not warranted for simple, straightforward computations. Here is an example of this, where we count the number of accesses to the input, the side length:
Input input = new Input(); Output output = (Output) factory.newComputation( input ); assertEquals( 0, input.getNumberOfAccessesToSide() ); output.getArea(); assertEquals( 2, input.getNumberOfAccessesToSide() ); output.getVolume(); assertEquals( 5, input.getNumberOfAccessesToSide() );
Note
This stateless behaviour is not guaranteed. AFC always caches repeating sections, and certain functions. NOW
, for example, is required to return the same value for the entire duration of a computation, so AFC caches the value returned on the first invocation. See further below for details.
Turning On Caching
Let’s now turn on explicit, full caching:
EngineBuilder builder = SpreadsheetCompiler.newEngineBuilder(); builder.loadSpreadsheet( getFile() ); builder.setInputClass( Input.class ); builder.setOutputClass( Output.class ); builder.setNumericType( unboundedBigDecimal ); builder.setFullCaching( true ); builder.bindAllByName(); SaveableEngine engine = builder.compile();
Here’s what happens if we run the above example again, but with caching enabled:
Input input = new Input(); Output output = (Output) factory.newComputation( input ); assertEquals( 0, input.getNumberOfAccessesToSide() ); output.getArea(); assertEquals( 1, input.getNumberOfAccessesToSide() ); output.getVolume(); assertEquals( 1, input.getNumberOfAccessesToSide() );
The input is only accessed once.
Was it worth it?
In this example, does caching pay off? Let’s compare the speed of computing the area and volume with and without caching for a largish value of side:
long startTime = System.nanoTime(); output.getArea(); output.getVolume(); output.getArea(); output.getVolume(); output.getArea(); output.getVolume(); long timeTaken = System.nanoTime() - startTime;
We find that caching is faster by more than than half:
input.setSide( "123456789123456789123456789123456789123456789123456789123456789123456789" ); long plainTime = 0; long cachingTime = 0; for (int i = 0; i < 100; i++) { plainTime += time( plainFactory, input ); cachingTime += time( cachingFactory, input ); } assertTrue( "Caching is at least half as fast again; caching is " + cachingTime + " vs. " + plainTime, cachingTime * 3 / 2 < plainTime );
Reusing Computations
As I said before, computations generated by AFC do not normally cache values internally, unless you explicitly turn on caching. This might lead you to believe that a computation behaves correctly when you call it multiple times while, in between, modifying the data returned by its input object. Here is an example of this:
Input input = new Input(); Output output = (Output) factory.newComputation( input ); input.setSide( "10" ); assertEquals( "100", output.getArea().toPlainString() ); assertEquals( "1000", output.getVolume().toPlainString() ); input.setSide( "5" ); assertEquals( "25", output.getArea().toPlainString() ); assertEquals( "125", output.getVolume().toPlainString() );
This is not guaranteed, however! AFC makes no promise to not cache internally. And what works now might not work in a future release of AFC. (In fact, for functions like NOW
it always has to cache, because NOW
is required to return the same value for the duration of a single computation.) And what about reusing computations where we do turn on caching? Here’s what happens if we run the above example again, with internal caching:
Input input = new Input(); Output output = (Output) factory.newComputation( input ); input.setSide( "10" ); assertEquals( "100", output.getArea().toPlainString() ); assertEquals( "1000", output.getVolume().toPlainString() ); input.setSide( "5" ); assertEquals( "100", output.getArea().toPlainString() ); assertEquals( "1000", output.getVolume().toPlainString() );
Note how changing the input value does not affect the output values. The are still cached from the previous use.
reset()
We have to be able to reset a computation when we have modified its input. AFC supports this if our output interface or class implements the Resettable
interface:
/** * Interface that must be implemented by an output class (or extended by an output interface) of * computations that need to reset internal caches of values - typically for reuse on modified input * values. * * @author peo */ public interface Resettable { /** * Clears all internal caches of the computation so it can be reused with changed input values. * You do not need to implement this method yourself. As long as you declare it, AFC will * implement it for you. If you do implement it, AFC will call it prior to resetting the * computation. */ void reset(); }
It does:
public static interface Output extends Resettable { BigDecimal getArea(); BigDecimal getVolume(); }
so let’s do this properly:
input.setSide( "10" ); assertEquals( "100", output.getArea().toPlainString() ); assertEquals( "1000", output.getVolume().toPlainString() ); input.setSide( "5" ); output.reset(); assertEquals( "25", output.getArea().toPlainString() ); assertEquals( "125", output.getVolume().toPlainString() );
Caching Internals
Non-caching
The generated non-caching computation is:
package org.formulacompiler.gen; import java.math.BigDecimal; import org.formulacompiler.runtime.Computation; import org.formulacompiler.runtime.internal.Environment; import org.formulacompiler.runtime.internal.RuntimeBigDecimal_v2; import org.formulacompiler.tutorials.Caching; final class $Root implements Computation, Caching.Output { private final Caching.Input $inputs; final Environment $environment; $Root(Caching.Input input, Environment environment) { $environment = environment; $inputs = input; } public final void reset() { /* empty */ } final BigDecimal get$0() { return get$1().multiply(get$1()).setScale(0, 4); } public final BigDecimal getArea() { return get$0(); } final BigDecimal get$1() { return RuntimeBigDecimal_v2.toNum($inputs.getSide()).setScale(0, 4); } final BigDecimal get$2() { return get$0().multiply(get$1()).setScale(0, 4); } public final BigDecimal getVolume() { return get$2(); } }
Caching
With caching, it is:
package org.formulacompiler.gen; import java.math.BigDecimal; import org.formulacompiler.runtime.Computation; import org.formulacompiler.runtime.internal.Environment; import org.formulacompiler.runtime.internal.RuntimeBigDecimal_v2; import org.formulacompiler.tutorials.Caching; final class $Root implements Computation, Caching.Output { private final Caching.Input $inputs; final Environment $environment; private boolean h$get$0; private BigDecimal c$get$0; private boolean h$get$1; private BigDecimal c$get$1; private boolean h$get$2; private BigDecimal c$get$2; $Root(Caching.Input input, Environment environment) { $environment = environment; $inputs = input; } public final void reset() { h$get$0 = false; h$get$1 = false; h$get$2 = false; } final BigDecimal get$0() { if (!h$get$0) { c$get$0 = get$1().multiply(get$1()).setScale(0, 4); h$get$0 = true; } return c$get$0; } public final BigDecimal getArea() { return get$0(); } final BigDecimal get$1() { if (!h$get$1) { c$get$1 = RuntimeBigDecimal_v2.toNum($inputs.getSide()).setScale(0, 4); h$get$1 = true; } return c$get$1; } final BigDecimal get$2() { if (!h$get$2) { c$get$2 = get$0().multiply(get$1()).setScale(0, 4); h$get$2 = true; } return c$get$2; } public final BigDecimal getVolume() { return get$2(); } }
Clearly, this is a lot more code than for the non-caching version. It is important to realize, however, that AFC only caches intermediate cell values which are referenced more than once in the computation. All others are inlined into the single expression that references them. Cells that are not referenced at all don’t appear in the computation at all, either.