Show Blogger Panel Hide Blogger Panel
Alex Yakunin

March 16, 2010

Can ORM make your application faster? Part 2: addressing SELECT N+1 problem

Optimization techniques

1. Addressing SELECT N+1 problem: prefetch API / projections in query language

You face SELECT N+1 problem, if you use nearly this code (note: all the code below is based on DataObjects.Net APIs):
var bestOrders = (
  from order in Query.All<Order>()
  orderby order.TotalPrice descending
  select order)
  .Take(100);
foreach (var order in bestOrders) {
  Console.WriteLine("TotalPrice: {0}, Customer: {1}",
    order, // Available
    order.Customer); // Not available, will lead to a query!
}

In case with DataObjects.Net, there are two options allowing to deal with it:

a) You can change the query itself to make LINQ translator to join the necessary relationship(s):
var bestOrders = (
  from order in Query.All<Order>()
  orderby order.TotalPrice descending
  select new {Order = order, Customer = order.Customer})
  .Take(100);
foreach (var item in bestOrders) {
  var order = item.Order;
  Console.WriteLine("TotalPrice: {0}, Customer: {1}",
    otder,
    order.Customer); // or item.Customer ;)
}

Underlying SQL query will contain an additional LEFT OUTER JOIN fetching necessary relationship.

Note: DO4 will use JOIN only if you fetch one-to-one relationships by this way. For one-to-many relationships fetched in such projections (.Select(...)) it uses batches with future queries (described further) instead of JOINs. This is done to avoid excessive growth of result set sent by DB to ORM  in case when several relationships are fetched by this way (ORM gets ~ a cartesian product of rows), but this might be changed in future. Likely, we'll fetch first one-to-many relationship using JOIN in future, and all the others by the same way as now.

b) Alternatively, you can use prefetch API:
var bestOrders = (
  from order in Query.All<Order>()
  orderby order.TotalPrice descending
  select order)
  .Take(100)
  .Prefetch(order => order.Customer);
foreach (var order in bestOrders) {
  Console.WriteLine("TotalPrice: {0}, Customer: {1}",
    otder,
    order.Customer); // or item.Customer ;)
}

Prefetch API operates differently in each ORM - e.g. it may rely on JOINs to original query, or may employ future queries and local collections in queries to do its job, but the idea behind is to fetch a graph of objects that is expected to be processed further with minimal count of queries (or, better, in minimal time).

Prefetch API is currently expected to be implemented in any serious ORM product.

Implementing the same using plain SQL is actually pretty normal for developers employing SQL. SQL design requires explicit description of what you're fetching (the only exception is SELECT *), so moreover, in most of cases this is simply inevitable here.

So SQL pushes developers to explicitly optimize the code from this point. The cost is tight dependency of resulting code on database schema - with all the consequences (more complex refactoring, etc.).

Return to the first post of this set (introduction and TOC is there).