March 16, 2010

Can ORM make your application faster? Part 1: overview of optimization cases.

I decided to write a set of posts covering various optimization techniques used by ORM tools after writing a reply to this question at StackOverflow. I'd like to break a myth that ORM usage may (and, as many think, will) slow down your application.

In particular, I'm gong to:
  • Describe most interesting optimization techniques used by ORM tools I know;
  • Estimate the complexity of getting the same optimization implemented with "plain SQL" case.
This post provides an overview of optimization cases ORM developers usually care about, so it defines the plan for my further writings.

Optimization cases

From the point of developer, there are following optimization cases he must deal with:

1. Reduce chattiness between ORM and DB. Low chatiness is important, since each roundtrip between ORM and database implies network interaction, and thus its length varies between 0.1ms and 1ms at least, independently of query complxity. Note that may be 90% of queries are normally fairly simple.

Particular case is SELECT N+1 problem: if processing of each row of some query result requires an additional query to be executed (so 1 + count(...) queries are executed in total), developer must try to rewrite the code in such a way that nearly constant count of queries is executed.

CRUD sequence batching and future queries are other examples of optimization reducing the chattines (described below). Caching API is extreme case of such optimizations, especially when cache isn't safe from the point of transaction isolation (i.e. it is allowed to provide "stale" results).

2. Reduce the size of result sets. Obviously, getting more data then you really need is bad - DB will spend unnecessary time on preparing it, moreover, it will be sent over network, so such an unnecessary activity will affect on other running queries by "eating" a part of available bandwith.

An APIs allowing to limit query result size (SELECT TOP N ... in SQL) and deliver subsequuent parts of result on-demand (MARS in SQL Server 2005 and above) are intended to deal with this issue. Lazy loading is another well-known example of this optimization: fetching large items (or values - e.g. BLOBs) only on demand is perfect, if you need them rarely.

3. Reduce query complexity. Usually ORM is helpless here, so this is solely a developer's headache.

For example, an API allowing to execute SQL commands directly is partially intended handle this case.

4. Runtime optimizations. It's bad when ORM itself isn't optimized well. 

Any ineffeciences at ORM or application side increase transaction length, and thus increase the database server load.

Particular example of runtime optimization is support of compiled queries. Just imagine: translation of simplest LINQ  query like "fetch an entity by its key" in Entity Framework requires nearly 1ms. This time is nearly equal to 10 roundtrips to local database server, i.e. to 10 similar queries, if they'd be compiled. So involvement of just this optimization may reduce the duration of transaction by up to 10 times.