March 19, 2010

Can ORM make your application faster? Part 5: asynchronous batch execution

This is the 5th part of sequence of posts describing optimization techniques used by ORM tools.

Optimization techniques

5. Asynchronous batch execution

Disclaimer: as far as I know, this optimization is not implemented in any of ORM tools yet. So this post is nothing more that a pure theory.

A part of batches are intended to return some result to the user code immediately; but there are batches that are executed just for their side effects (i.e. database modifications). The only interesting result here is an error, and usually it does not matter if you'll get it now, or later in the same transaction (moreover, in many cases, e.g. with particular MVCC implementations, certain errors are really detected just on transaction commit). 

Based on this, we can implement one more interesting optimization:
  • All the batches executed just for side effects are executed assynchronously, but certainly, sequentially and synchronously on the underlying connection object. This can be achieved, if all the underlying job is done in background thread dedicated to this. Let's call the abstraction executing our batches as AsyncProcessor (AP further).
  • The batches returning some result are executed synchronously by AP, and any errors it gets on their execution are thrown to application code directly.
  • If some batch executed asynchronously by AP fails, AP enters error state. Being in error state, it re-throws the original exception on any subsequent attempt to use it. Error state is cleared when each  new transaction is starting.
Result of this optimization: when mainly CRUD operations are performed, the thread preparing the data for them (i.e. the thread creating and modifying entities) doesn't waste its time waiting for database server replies. So application and database server operate in parallel in this case.

Ok, but how much it can speedup e.g. bulk data import operation? My assumption is up to 2 times (of course, if CRUD sequence batching or generalized batching is implemented). A good approval for this is CRUD test sequence result at ORMBattle.NET:
  • Compare DataObjects.Net result vs SqlClient on CRUD tests. DataObjects.Net implements generialized batching, but does not implement asynchronous batch execution, and its result is nearly two times lower then result of SqlClient. Note that SqlClient test there is explicitely optimized to show maximal possible performance - so it batches CRUD commands as well.
  • On the other hand, there is BLToolkit, which, althought does not privide automatic CRUD batching, offers an explicit API for this (SqlQuery.DeleteUpdate and Insert methods accepting batch size and sequence of entities to process (IEnumerable<T>)). In case with such an API, ORM must do almost nothing except transforming and passing the data further (e.g. no topological sorting is necessary to determine valid insertion order), so BLToolkit shows nearly the same result as SqlClient.
And obviously, this optimization won't affect much on transactions doing mainly intensive reads.

Implementing this optimiziation using plain SQL isn't really difficult, if generalized or CRUD batching is already implemented - note that it is required to be implemented here, and that's where real problems are.

Return to the first post of this set (introduction and TOC is there).