Show Blogger Panel Hide Blogger Panel
Alex Yakunin

September 25, 2009

Upcoming changes: prefetch API and ubiquitous usage of future queries

Prefetch API

I'd like to show these new v4.1 features on examples:
var query = 
  (from c in Query<Customer>.All
  where c.Name.StartsWith("A")
  select c)
  .Prefetch(c => c.Department)
  .PrefetchMany(c => c.Orders, orders => orders // All orders
    .Prefetch(o => o.Items, 50) // Up to 50 items
    .Prefetch(o => o.Info)

So as you see, we precisely specify what to prefetch here. Likely, you think Prefetch is an extension method our LINQ translator is able to handle? No, is isn't related to LINQ at all!

Let me show one more example:
var query = 
  (from id in new [] {1, 2, 3}
  select Key.Create(id))
  .Prefetch<Customer>(key => key); // Key selector

// Type of query object: Prefetcher<Key,Customer>
// Its instance is already aware how to extract key 
// from each item of the source sequence.

var query2 =
  .Prefetch(c => c.Department)
  .PrefetchMany(c => c.Orders, orders => orders // All orders
    .Prefetch(o => o.Items, 50) // Up to 50 items
    .Prefetch(o => o.Info)

As you see, the source for prefetching can be any IEnumerable. But how does it works than?

1. When you append .Prefetch to IEnumerable or Prefetcher<TI,TE>, a new Prefetcher<TI,TE> instance gets born. It contains additional prefetching instruction in comparison to its predecessor - almost like IQueryable.
2. When you start to enumerate Prefetcher, it enumerates the underlying sequence and for each extracted key there it puts all prefetch paths associated with it into Session prefetch queue using low-level (Key & FieldInfo-based) prefetch API. This API ensures the whole bulk of scheduled prefetches will be executed as part of the next batch request sent to the database. Moreover, this API tries to run minimal count of such queries - it achieves this by grouping scheduled prefetch requests by types, collections and so on.
3. Finally prefetcher gets a notification that first prefetch batch is sent (actually it simply notices increase of "# of sent batch" counter). This means all the data it queued for prefetch is already available. So it returns all the processed items of original sequence, and repeats steps 2-3 until its completion.

What are the benefits of this approach?
  • Low-level prefetch API (btw, it is public) is Key-based. This means it can resolve a particular prefetch request e.g. via cache! So when we'll add global cache, this must help us a lot. In fact, prefetch may lead to zero additional trips to the database. In fact, SessionHandler is aware about any prefetch request now, so it can resolve it until it will be scheduled.
  • We routed all the data load requests we have inside Entity and EntitySet via this API. So you can prefetch not just something you query, but something you're planning to access further.
  • Low-level prefetch API relies on future queries while scheduling a bulk of prefetch requests. So actually we didn't develop something completely new for this.

Ubiquitous future queries

They'll be really ubiquitous. In particular, we'll use them to:
- Discover all the references to removing instance.
- Execute subqueries described in Select expression (LINQ).

Here is an example with Select:
var query = 
  from c in Query<customer>.All
  select new {
    Customer = c,
    Top50Orders = c.Orders.Take(50)

foreach (var c in query) {
  Console.WriteLine("Customer:    {0}", c.Name);
  // Next line must lead to an additional query.
  // This query will be performed on each loop iteration in v4.0.5,
  // but not in v4.1. Our materialization pipeline will run such 
  // queries in bulks using future query API.
  Console.WriteLine("  50 Orders: {0}", c.Top50Orders);

Likely, you don't know, but our query result materialization pipeline batches the original sequence into "materialization bulks". In v4.1 its size starts from 8 and increases by 2 times each time until it reaches 1024. So in fact, we'll run batches making additional queries ~ once per 8, 16, 32 ... 1024 loop iterations. But since there is upper limit on 25 queries per batch, "materialization bulks" containing > 25 items will span into multiple batches.


We hope this (along with other planned features) will allow us to deliver simply unbeatable performance in real-world applications: can you imagine your application sends just few batches per each transaction almost each time? That's almost impossible to achieve the same even on plain ADO.NET.

In fact, DataObjects.Net establishes intelligent request queue between application and database server, acting as interaction optimizer eliminating the chattiness.

CUD sequence batching in v4.0.5 itself really well on data modifying scenarios. Now we're bringing batching, future queries and prefetches to all other parts of the framework, increasing the performance on data access operations, that's normally more important.