March 17, 2010

Can ORM make your application faster? Part 4: future queries and generalized batching

This is the 4th part of sequence of posts describing optimization techniques used by ORM tools.

Optimization techniques

3. Future queries

Future (or delayed) queries is an API allowing to delay an execution of particular query until the moment when its result will be necessary. 
  • If there are several future queries scheduled at this moment, they're executed alltogether as a single batch. 
  • If regular query is going to be executed, but there are scheduled future queries, they're also executed along with it (i.e. in a single batch).
So the main benefit of this optimization is, again, reduction of number of roundtrip to thr database server, or, simply, reduction of chattiness between ORM and DB.

This feature is pretty rare in ORM products, but e.g. NHibernate implements it. Future query API for DataObjects.Net is described in appropriate section of its Manual (take a look, if you're interested in example with underlying SQL).

Implementing future queries using plain SQL is pretty complex, if you don't know the exact sequence of queries, that are planned to be executed further (again, pretty frequent case). The issues are nearly the same as with CRUD batching, but, here you must additionally care about the results as well.

4. Generalized batching

That's my favorite part: the "generalized batching" term itself is my own invention. The description is actually very simple: it is a combination of above two optimizations (future queries + CRUD sequence batching). It's a case, when ORM is capable of combining batches from:
  • Delayed CRUD statements
  • Delayed future queries
  • And the query, which result must be provided immediately (i.e. requested by an applciation right now)
The goal is, again, to reduce the chattiness. When this optimization is implemented, estimated number of batches sent per each transaction (or the number of roundtrips to the database server) is nearly equal to 
  • 1 for beginning the transaction (it is actually can be joined by subsequent command by underlying provider)
  • N batches, where N is the number of queries in transaction requiring the result immediately
  • Possibly, 1 for flusing the "tail" (last unflushed batch)
  • 1 for comitting the transaction.
Or, shortly, Q+C, where Q is the number of queries in transaction requiring the result immediately, and C is constant. This is much better than the same C + (count of CRUD statements) + (count of queries) that you have in normal case.

AFAIK, DataObjects.Net is currently the only ORM implementing this optimization. Recently I wrote a post, where it was employeed, but the case was more tricky than it initially seemed. Anyway, the screenshot from that post showing an example of batch containing both CRUD statements and regular query is on the right side.

Implementing this optimiziation using plain SQL is hell - the picture on the right perfectly illustrates this. At least, you need your own single-point API passing all the queries and CRUD statements thought it to make this work.

Return to the first post of this set (introduction and TOC is there).