Repeating Sections and Caching
Right now, SEJ mandates caching for engines with repeating sections. This is because, internally, SEJ constructs an array of section element engines for every element in a repeating section when the section is first accessed. This is a cache, and so SEJ mandates Resettable
on your output interface.
Cursors
This behaviour makes the current version of SEJ unusable for computations involving large datasets. In the interface, you can use an iterator that you might, for example, map to a database cursor. But SEJ will simply traverse it fully and construct and cache element engines for every row before computing anything.
A better approach would be to allow people to flag a section as potentially huge when binding it (to an Iterator
or Iterable
, of course). SEJ would then know to not cache the section, and to use a flyweight element engine (like a current cursor position on the interface’s iterator).
Aggregators
Some of the current aggregator implementations would not play too nicely with this scheme.
COUNT()
- would have to be rewritten. It currently is very quick because it already knows the size of the internal cache array. Iterators it would have to, well, iterate.
AVG()
- is currently defined as
SUM() / COUNT()
. With iterators, this would mean two iterations. So we would have to change this to an explicit iteration that sums and counts in one go. VAR()
- Should also use the above mentioned iteration that returns both sum and count. It would still have to iterate twice, but at least only twice.
Restarting Cursors
The client would have to be prepared to return a cursor more than once, maybe sometimes even in parallel (though that seems unlikely). So your database query must be restartable.