November 27, 2009

New feature of DataObjects.Net v4.1: subquery batching

We implemented this feature few weeks ago. Imagine we execute the following code:
var query = 
  from customer in Query<Customer>.All
  select new {
    Customer = customer,
    First5Orders = (
      from order in Query<Order>.All
      where order.Customer==customer
      orderby order.Id
      select order
    ).Take(5)
  };
var queryResult = query.ToList(); // Actual execution
  
Console.WriteLine("queryResult.Count: {0}", 
  queryResult.Count);
foreach (var item in queryResult) {
  var subqueryResult = item.First5Orders.ToList(); 
  // Actual execution must happen here, 
  // but see the comments below.
  Console.WriteLine("subqueryResult.Count: {0}", 
    subqueryResult.Count);
}
As you see, this is a typical case where you must get 1+N queries:
  • First query will be the main one
  • All the others are its subqueries. As far as we know, any other ORM will execute a  particular one of them on attempt to enumerate.
So e.g. if queryResult.Count==90, you must get 91 queries - a particular example of "Select N+1" issue.

But DO4 will send just 6 batches!

The first one is:
SELECT 
  [a].[CustomerId], 
  [a].[TypeId], 
  [a].[CompanyName], 
  [a].[ContactName], 
  [a].[ContactTitle], 
  [a].[Address.StreetAddress], 
  [a].[Address.City], 
  [a].[Address.Region], 
  [a].[Address.PostalCode], 
  [a].[Address.Country], 
  [a].[Phone], [a].[Fax] 
FROM 
  [dbo].[Customers] [a];
All the subsequent ones look like this:
exec sp_executesql N'SELECT TOP 5 [a].[OrderId], 
[a].[TypeId], [a].[ProcessingTime], [a].[ShipVia.Id], [a].[Employee.Id], [a].[Customer.Id], 
[a].[OrderDate], [a].[RequiredDate], [a].[ShippedDate], [a].[Freight], [a].[ShipName], 
[a].[ShippingAddress.StreetAddress], [a].[ShippingAddress.City], [a].[ShippingAddress.Region], 
[a].[ShippingAddress.PostalCode], [a].[ShippingAddress.Country] FROM [dbo].[Order] [a] 
WHERE ([a].[Customer.Id] = @p1_0) ORDER BY [a].[OrderId] ASC;

-- ...
-- A set of similar queries is skipped to shorten the output
-- ...

SELECT TOP 5 [a].[OrderId], [a].[TypeId], [a].[ProcessingTime], [a].[ShipVia.Id], 
[a].[Employee.Id], [a].[Customer.Id], [a].[OrderDate], [a].[RequiredDate], [a].[ShippedDate],
[a].[Freight], [a].[ShipName], [a].[ShippingAddress.StreetAddress], [a].[ShippingAddress.City], 
[a].[ShippingAddress.Region], [a].[ShippingAddress.PostalCode], [a].[ShippingAddress.Country] 
FROM [dbo].[Order] [a] WHERE ([a].[Customer.Id] = @p16_0) ORDER BY [a].[OrderId] ASC;
',[email protected]_0 nvarchar(5),@p2_0 nvarchar(5),@p3_0 nvarchar(5),@p4_0 nvarchar(5),@p5_0 
nvarchar(5),@p6_0 nvarchar(5),@p7_0 nvarchar(5),@p8_0 nvarchar(5),@p9_0 nvarchar(5),@p10_0 
nvarchar(5),@p11_0 nvarchar(5),@p12_0 nvarchar(5),@p13_0 nvarchar(5),@p14_0 nvarchar(5),@p15_0 
nvarchar(5),@p16_0 nvarchar(5)',
@p1_0=N'ALFKI',@p2_0=N'ANATR',@p3_0=N'ANTON',@p4_0=N'AROUT',@p5_0=N'BERGS',@p6_0=N'BLAUS',
@p7_0=N'BLONP',@p8_0=N'BOLID',@p9_0=N'BONAP',@p10_0=N'BOTTM',@p11_0=N'BSBEV',@p12_0=N'CACTU',
@p13_0=N'CENTC',@p14_0=N'CHOPS',@p15_0=N'COMMI',@p16_0=N'CONSH'

As you see, we execute such subqueries as future queries - i.e. they're performed in batches. This does not mean we materialize the whole query result at once - instead, we process it part by part:
  • When you pull out the first item, we materialize first 16 items & cache them. If there are subqueries, they're processed as future queries transparently for you.
  • When you pull out 16th item, we materialize 32 more of them at once by the same fashion.
  • And so on; maximal size of such a bulk is 1024.
  • Note that we called .ToList() here, so it was actually fully enumerated at that moment, and thus all the batches were executed during .ToList() processing. But if we'd use it in foreach loop and break from it, only a part of result would be materialized.
So such a materialization process allows us to optimize the interaction with RDBMS (reduce the chattiness) transparently for you. The process is fully recursive - so e.g. if subquery contains other subqueries, they'll be resolved by the same fashion. Moreover, if you select EntitySet in final selector, it is prefetched by the same way.

So this is a good alternative to prefetch API.