September 30, 2009

DataObjects.Net v4.1: when and what will be there?

When: October, 12.

What will be there:
  • Manual.
  • Integrated PostSharp. Consequently, it won't be necessity to install any third-party tools at all.
  • Simplified referencing to DataObjects.Net from your own projects. No more GAC install, and quite likely, you'll need to reference just 2 assemblies. All the dependencies will be copied to bin folder automatically.
  • Oracle support.
  • DisconnectedState. Likely, initial version will have some lacks, but we'll eliminate them ASAP. See "Disconnected state" section here.
  • Versions and optimistic locking support. See "Object versions" section here.
  • Future queries. See "Future queries" section here.
  • Prefetch API. Likely, without some internal performance optimizations - we can leave some of them for v4.1.1.
  • Support for local collections in LINQ queries.
  • Fully working persistent interfaces. Although [MaterializedView] won't work for them (this affects only on performance, but not on features).
  • Improved validation API. Now we're supporting IDataErrorInfo; validation-related classes are
    refactored to be more usable.
  • Über-batching. See corresponding section here.
  • New samples. No other details for now except that there will be ASP.NET MVC sample.
  • No explicit bindings to Unity and, likely, to Log4Net.
  • Lots of minor improvements, e.g. explicit locking.
And I explicitly announce right after this release we will start using public source code repository for DataObjects.Net, so it will be fully honest to call it open source project.

You will find its source code here.

Btw, distributed VCS like Mercurial provide the easiest way for external developers to modify the source code and maintain their own branches. Such branches can be maintained either locally (with automatic merging the changes made by us) or at our own repository. So if you would like to contribute something, is will be really simple: download TortoiseHg, pull our repository, build DO4 (this requires nothing special now!), modify it and push back your changes (you must be added to committers to be allowed to do this) or send them as patch to us. Easy, as 1-2-3.

Upcoming changes: local collections in LINQ queries

I'm continuing to demonstrate the magic we're working on now. Local collections are actually query parameters of IEnumerable<T> type.

Let's take a look at their usage in, likely, the simplest case:
var query = 
  from o in Query<Order>.All
  where
    (new {"Alex", "Alexey", "Dmitry"})
    .Contains(o.Customer.FirstName)
  select o;
As you suspect, such a query will be translated to SQL with IN operator.

But what about this one:
var query = 
  from o in Query<Order>.All
  where
    (from n in Enumerable.Range(1, 100000)
    select n.ToString())
    .Contains(o.Customer.FirstName)
  select o;
What SQL would you expect to see behind the scenes in this case? Will it work at all?

Ok, now I'm going to astonish you completely:
var bestOrdersQuery = 
  from c in Query<Order>.All
  let price = o.Details.Sum(o => o.UnitPrice * o.Quantity)
  order by -totalPrice
  select new {
    Order = o, 
    Price = price
  };

best10KOrders = bestOrdersQuery.Take(10000).ToList();

var query = 
  from bo in Query.Store(best10KOrders) // Note that this is List<anonymoustype>!
  from c in Query<Customer>.All
  where c.Orders.Contains(bo.Order)
  select new {Customer = c, Price = bo.Price} into pair
  group pair by pair.Customer into g
  select new { Customer = g.Key, PaidInBest10KOrders = g.Sum(i => i.Price) }
This query will work as well.

All I wrote implies our query engine is able to:
  • Store almost any collection into a temporary table before query execution
  • Maintain mapping for its items: you can use any properties of items in such collections inside LINQ queries
  • Use IN in SQL instead of temporary table when this is possible. This significantly depends on provider. E.g. SQL Server does not support tuples in IN, but PostgreSQL does, so in case with SQL Server we'll be able to represent only collections of primitive types or keys by this way. There are other limitations we're going to consider, e.g. maximal parameter count and maximal comparison operations count per query.
Consequently, it is possible to use items of such collections as any other type supported by our LINQ translator. E.g. you can join such a collection, group it by some property, calculate an aggregate and so on. The only exception is trying to get  types we can't materialize in final select clause:
var bestOrdersQuery = 
  from c in Query<Order>.All
  let price = o.Details.Sum(o => o.UnitPrice * o.Quantity)
  order by -totalPrice
  select new {
    Order = o, 
    Price = price
  };

best10KPairs = 
  bestOrdersQuery.Take(10000).ToList()
  .Select(i => new Pair<Order, double>(i.Order, i.Price).ToList();

var query = 
  from p in Query.Store(best10KPairs) // Note that we store Pair<Order, double> here!
  select p; // Will not work, since actually we don't know how to materialize Pair<Order, double>
In this case we simply don't know how to materialize Pair&;ltOrder, double>, because we never seen its construction. On the other hand, this query will work:
var query = 
  from p in Query.Store(best10KPairs) // Note that we store Pair<Order, double> here!
  select new Pair<Order, double>(p.First, p.Second);
Why I think such complex implementation of this feature is really necessary and attractive? 
  • Think about upcoming integration with full-text search engines. Results they return can be very large, and ideally we must be able to pass them to further processing on RDBMS in any reasonable case. I'm not sure if you know this or not, but v3.9 was able to process up to 1K results returned by Lucene.Net in case it was used per each query.
  • Prefetch is one more nice application of this feature, although IN optimization is more desirable here than a version with temporary table.
  • Finally, if there will be executable DML queries some day, this feature might help a lot here as well.
Final remarks:
  • Likely, initially there will be some minor lacks. E.g. IN optimization might not work in v4.1. I simply not sure if we'll be able to complete this.
  • But we'll resolve them in v4.1.1 ;)

LiveUI was shown on Urals .NET User Group

Yesterday I've been on the meeting of Urals .NET User Group, where Alexander Ilyin (our developer) has described his LiveUI framework. Since there were two presentations related to ASP.NET MVC, it was really interesting to hear about the differences and difficulties he tried to solve. In general, the speech and presentation were prepared really well.

Btw, shortly we're going to launch new LiveUI website. It will be really cool, and moreover, it will be running on LiveUI and DataObjects.Net 4. So I'll return back to this shortly.

September 28, 2009

Selecting VCS for DO4: Git versus Mercurial

Shortly, Git lost this fight in one day. There are many articles related to this, but I found the following ones most interesting:
Btw, initially I thought we must choose Git. Its TortoiseGit tool looks better than TortoiseHg; on the other hand, I found likely we have no chances to see VisualGit, because Git is distributed only under GPL license. Anyway, I decided to try both tools, and...

TortoiseHg

I spent less than 30 minutes to test all the basic features there, including:
  • Create a repository
  • Create its copy
  • Launch a web server exposing one of them via HTTP (really simple: "hg serve"). It was really interesting for me to see how Mercurial synchronizes them.
  • Local modifications
  • Commits
  • Updates / switching to older revisions
  • Sync (push / pull)
  • Merging.
Everything just worked! I was even able to convert one of our internal repositories to Hg. So it was really nice, although TortoiseHg UI isn't polished at all.

TortoiseGit + msysGit

I stopped on step 3 here: "Launch a web server exposing the repository". I couldn't even imagine it isn't easy to do this at all on windows. So it was the first signal, although TortoiseGit was looking nice.

Ok, I started to read how to accomplish this, and clearly understood I don't want to use Git:

  • If such a simple problem requires so many software to be installed and configured, I'd definitely prefer a simpler tool.
  • I discovered Git is really built as ~ 50 small applications written in C! I simply don't see any reason to prefer C instead of other languages here: VCS performance mainly depends on algorithms and data structures, so using C here is as ridiculous as writing a web site on it. Why Linus didn't use e.g. Java, if he wanted it to be portable? Language is a tool, and using completely wrong tool has no excuses for me.
  • So it became clear that it will be a hell to deal with Git on Windows. Currently I use just one of such ported tools in my daily life: Unison; everything else didn't survive.
  • Finally, I discovered that Google implemented their own Mercurial repository (the one that is available on Google Code) over BigTable (see this link). This just proves the above fact: use right tool. If you need it to be portable, don't use C; if it isn't a low-level service or driver, don't use C. Git violates both statements.
So my own conclusion is:
  • If you like to use tools developed by masochists (written in C!) for masochists (~ 50 tools!), Git is your choice.
  • Otherwise, use Mercurial.
Of course, it's joke, but every joke has some part of truth.

P.S. This also implies that DO4 source code will be available in Mercurial repository hosted at Google Code quite shortly. Install TortoiseHg - as I wrote, now DO4 requires no additional tools to be built or used.

September 27, 2009

What's up with nightly builds of v4.0?

Earlier I wrote they'll be suspended, and they still are. As you know, after getting the main part done (PostSharp is integrated into our build process, so it isn't necessary to install it separately),  I was additionally able to merge most of our assemblies into the single one, I still can't get this stuff work in runtime. The problem is that PostSharp can't deserialize something inside serialized aspects because this "something" is referencing old assembly name; setting LaosSerializationBinder.Current does not help, since a strange reason BinaryFormatter does not uses it for one of serialized types.

So the only solution I have now is to make PostSharp to post-process the merged assembly instead of its parts. But since I need properly working parts as well, this turns into really complex modification to our build process.

When this is done, I'll switch to the installer (this part is much simpler). And after getting it working I'll re-enable nightly builds. I hope to finish this in ~ 3-4 days.

Crunch a mathematical problem with LINQ ;)

The problem: "A book has 352 pages. How many 4's were used to print all of the page numbers?"

Here is the answer in LINQ (actually I reposted the task from this page). But check out the comments - the original solution was wrong, and there are so many others.

September 26, 2009

Oracle: a reason to hate it for ORM developers

We're fighting with one quite well-known and annoying issue with Oracle: it considers empty string equal to null (but not vice versa!). Just think about this:
C#:  someString==""   // false, if @someString!=""; otherwise, true
SQL: @someString = '' -- NULL in any case:
                      -- Oracle represents this as @someString = NULL!

C#:  someString.Length==0    // true, if @someString==""
SQL: LENGTH(@someString) = 0 -- NULL in the same case!

C#:  someString.Trim.Length==0       // true, if @someString==""
SQL: LENGTH(TRIM(BOTH FROM ' ')) = 0 -- NULL in the same case!

So we (actually, Denis Krjuchkov - he fights with Oracle now) really don't know how to deal with this. In many cases (e.g. in case with string.Length) we implement "backward" logic (null.Length==0), which is correct from the point of logic, we even have similar issue for IMDB provider. But it's really unclear what to do with many other cases.

If you have any ideas on how to deal with this, please notify us. But don't offer e.g. to prefix all stored strings and query parameters of string type with " " - this might work, but it will be a complete hell to study our queries in this case.

I'm curious, why they made this decision... It is absolutely crazy: why empty string means "undefined value" (NULL) there? It is precisely defined value. As Denis has suggested, probably they just wanted to save few bytes per each of such value in these early times than Oracle was established (1970).

You might know that earlier I wrote about another similar issue (see "7: Oracle support" there) with nested SELECT in SELECT clause. There was nearly the same case - intentionally broken genericity, and this is really disappointing. Why there are no such issues with free PostgreSQL? Oracle proves once more it is so cool that only humans are admitted to write its queries. Machines, get your hands off Oracle!

It's interesting to know, what other ORM developers do with this? Do they deal with this at all?

Conclusion: think twice before making optimizations that change default logic in few particular cases. Likely, breaking the genericity is much worse.

September 25, 2009

Upcoming changes: prefetch API and ubiquitous usage of future queries

Prefetch API

I'd like to show these new v4.1 features on examples:
var query = 
  (from c in Query<Customer>.All
  where c.Name.StartsWith("A")
  select c)
  .Prefetch(c => c.Department)
  .PrefetchMany(c => c.Orders, orders => orders // All orders
    .Prefetch(o => o.Items, 50) // Up to 50 items
    .Prefetch(o => o.Info)
  );

So as you see, we precisely specify what to prefetch here. Likely, you think Prefetch is an extension method our LINQ translator is able to handle? No, is isn't related to LINQ at all!

Let me show one more example:
var query = 
  (from id in new [] {1, 2, 3}
  select Key.Create(id))
  .Prefetch<Customer>(key => key); // Key selector

// Type of query object: Prefetcher<Key,Customer>
// Its instance is already aware how to extract key 
// from each item of the source sequence.

var query2 =
  .Prefetch(c => c.Department)
  .PrefetchMany(c => c.Orders, orders => orders // All orders
    .Prefetch(o => o.Items, 50) // Up to 50 items
    .Prefetch(o => o.Info)
  );

As you see, the source for prefetching can be any IEnumerable. But how does it works than?

1. When you append .Prefetch to IEnumerable or Prefetcher<TI,TE>, a new Prefetcher<TI,TE> instance gets born. It contains additional prefetching instruction in comparison to its predecessor - almost like IQueryable.
2. When you start to enumerate Prefetcher, it enumerates the underlying sequence and for each extracted key there it puts all prefetch paths associated with it into Session prefetch queue using low-level (Key & FieldInfo-based) prefetch API. This API ensures the whole bulk of scheduled prefetches will be executed as part of the next batch request sent to the database. Moreover, this API tries to run minimal count of such queries - it achieves this by grouping scheduled prefetch requests by types, collections and so on.
3. Finally prefetcher gets a notification that first prefetch batch is sent (actually it simply notices increase of "# of sent batch" counter). This means all the data it queued for prefetch is already available. So it returns all the processed items of original sequence, and repeats steps 2-3 until its completion.

What are the benefits of this approach?
  • Low-level prefetch API (btw, it is public) is Key-based. This means it can resolve a particular prefetch request e.g. via cache! So when we'll add global cache, this must help us a lot. In fact, prefetch may lead to zero additional trips to the database. In fact, SessionHandler is aware about any prefetch request now, so it can resolve it until it will be scheduled.
  • We routed all the data load requests we have inside Entity and EntitySet via this API. So you can prefetch not just something you query, but something you're planning to access further.
  • Low-level prefetch API relies on future queries while scheduling a bulk of prefetch requests. So actually we didn't develop something completely new for this.

Ubiquitous future queries

They'll be really ubiquitous. In particular, we'll use them to:
- Discover all the references to removing instance.
- Execute subqueries described in Select expression (LINQ).

Here is an example with Select:
var query = 
  from c in Query<customer>.All
  select new {
    Customer = c,
    Top50Orders = c.Orders.Take(50)
  }

foreach (var c in query) {
  Console.WriteLine("Customer:    {0}", c.Name);
  // Next line must lead to an additional query.
  // This query will be performed on each loop iteration in v4.0.5,
  // but not in v4.1. Our materialization pipeline will run such 
  // queries in bulks using future query API.
  Console.WriteLine("  50 Orders: {0}", c.Top50Orders);
}

Likely, you don't know, but our query result materialization pipeline batches the original sequence into "materialization bulks". In v4.1 its size starts from 8 and increases by 2 times each time until it reaches 1024. So in fact, we'll run batches making additional queries ~ once per 8, 16, 32 ... 1024 loop iterations. But since there is upper limit on 25 queries per batch, "materialization bulks" containing > 25 items will span into multiple batches.

Epilogue

We hope this (along with other planned features) will allow us to deliver simply unbeatable performance in real-world applications: can you imagine your application sends just few batches per each transaction almost each time? That's almost impossible to achieve the same even on plain ADO.NET.

In fact, DataObjects.Net establishes intelligent request queue between application and database server, acting as interaction optimizer eliminating the chattiness.

CUD sequence batching in v4.0.5 itself really well on data modifying scenarios. Now we're bringing batching, future queries and prefetches to all other parts of the framework, increasing the performance on data access operations, that's normally more important.

Currying in C#

I'd like to show a very simple, but really magical T4 template added by Denis Krjuchkov to Xtensive.Core library. But let's list its output first:
using System;

namespace Xtensive.Core.Helpers
{
  /// <summary>
  /// Extension methods for binding delegates to parameters.
  /// </summary>
  public static class DelegateBindExtensions
  {
    /// <summary>Binds first 1 argument(s) to specified delegate.</summary>
    /// <returns> A delegate that takes the rest of arguments of original delegate.</returns>
    public static Func<TResult> Bind<T1, TResult>(this Func<T1, TResult> d, T1 arg1)
    {
      return () => d.Invoke(arg1);
    }

    /// <summary>Binds first 1 argument(s) to specified delegate.</summary>
    /// <returns> A delegate that takes the rest of arguments of original delegate.</returns>
    public static Func<T2, TResult> Bind<T1, T2, TResult>(this Func<T1, T2, TResult> d, T1 arg1)
    {
      return (arg2) => d.Invoke(arg1, arg2);
    }

    /// <summary>Binds first 2 argument(s) to specified delegate.</summary>
    /// <returns> A delegate that takes the rest of arguments of original delegate.</returns>
    public static Func<TResult> Bind<T1, T2, TResult>(this Func<T1, T2, TResult> d, T1 arg1, T2 arg2)
    {
      return () => d.Invoke(arg1, arg2);
    }

    /// <summary>Binds first 1 argument(s) to specified delegate.</summary>
    /// <returns> A delegate that takes the rest of arguments of original delegate.</returns>
    public static Func<T2, T3, TResult> Bind<T1, T2, T3, TResult>(this Func<T1, T2, T3, TResult> d, T1 arg1)
    {
      return (arg2, arg3) => d.Invoke(arg1, arg2, arg3);
    }

    /// <summary>Binds first 2 argument(s) to specified delegate.</summary>
    /// <returns> A delegate that takes the rest of arguments of original delegate.</returns>
    public static Func<T3, TResult> Bind<T1, T2, T3, TResult>(this Func<T1, T2, T3, TResult> d, T1 arg1, T2 arg2)
    {
      return (arg3) => d.Invoke(arg1, arg2, arg3);
    }

    ...

    /// <summary>Binds first 8 argument(s) to specified delegate.</summary>
    /// <returns> A delegate that takes the rest of arguments of original delegate.</returns>
    public static Action<T9> Bind<T1, T2, T3, T4, T5, T6, T7, T8, T9>(this Action<T1, T2, T3, T4, T5, T6, T7, T8, T9> d, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8)
    {
      return (arg9) => d.Invoke(arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9);
    }

    /// <summary>Binds first 9 argument(s) to specified delegate.</summary>
    /// <returns> A delegate that takes the rest of arguments of original delegate.</returns>
    public static Action Bind<T1, T2, T3, T4, T5, T6, T7, T8, T9>(this Action<T1, T2, T3, T4, T5, T6, T7, T8, T9> d, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9)
    {
      return () => d.Invoke(arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9);
    }
  }
}

This template allows "currying" delegates of Action<...> and Func<...> types by binding first N arguments out of M there. Each of Bind methods returns a delegate with M-N arguments, which invocation is similar to invocation of the original delegate with the first N arguments set to specified values.

Here is an example:
var add = (a, b) => a + b;
var addOne = add.Bind(1); // addOne = b => 1 + b;
Console.WriteLine(addOne(2)); // Prints 3

var multiplyAdd = (a, b, c) => a + b * c;
var multiply2Add3 = multiplyAdd.Bind(3, 2); // multiply2Add3 = c => 3 + 2 * c;
Console.WriteLine(multiply2Add3(1)); // Prints 5

So it is really simple to use this. Here is an alternative (check out the Curry() code they use), which is closer to commonly known currying, but there are more delegate creation operations, especially - on binding multiple arguments. E.g. calling add.Curry()(1) implies two delegate creations: first call returns Func<int, Func<int, int>>, and the second returns Func<int, int>. In our case there will be just one delegate. Accordingly, there will be 4 delegates in case with addMultiply.Curry()(2)Curry(3), and just one in our case.

Finally, here is T4 template generating Bind extension methods:
<#@ output extension="cs" #>
<#@ template language="C# 3.5" #>
<#@ assembly name="System.Core" #>
<#@ import namespace="System" #>
<#@ import namespace="System.Linq" #>
<#
  int NumberOfParameters = 9;

  Action<int, int> WriteParametersDeclaration = delegate(int firstIndex, int lastIndex) {
    if (firstIndex > lastIndex)
      return;
    for (int k = firstIndex; k < lastIndex; k++)
      Write(string.Format("T{0} arg{0}, ", k));
    Write(string.Format("T{0} arg{0}", lastIndex));
  };

  Action<int, int, bool> WriteGenericParameters = delegate(int firstIndex, int lastIndex, bool forAction) {
    if (firstIndex > lastIndex) {
      if (!forAction)
        Write("<TResult>");
      return;
    }
    Write("<");
    if (forAction) {
      for (int k = firstIndex; k < lastIndex; k++)
        Write(string.Format("T{0}, ", k));
      Write(string.Format("T{0}", lastIndex));
    }
    else {
      for (int k = firstIndex; k <= lastIndex; k++)
        Write(string.Format("T{0}, ", k));
      Write("TResult");
    }
    Write(">");
  };

  Action<int, int> WriteParameters = delegate(int firstIndex, int lastIndex) {
    if (firstIndex > lastIndex)
      return;
    for (int k = firstIndex; k < lastIndex; k++)
      Write(string.Format("arg{0}, ", k));
    Write(string.Format("arg{0}", lastIndex));
  };
  
  Action<bool> WriteType = delegate (bool forAction) {
 Write(forAction ? "Action" : "Func");
  };
  
  Action<int, int, bool> WriteFirstParameter = delegate(int firstIndex, int lastIndex, bool forAction) {
    Write("this ");
    WriteType(forAction);
    WriteGenericParameters(firstIndex, lastIndex, forAction);
    Write(" d, ");
  };

  Action<bool> WriteMethods = delegate(bool forAction) {
    for (int total = 1; total <= NumberOfParameters; total++) {
      for (int bound = 1; bound <= total; bound++) {
        Write("/// <summary>Binds first ");
        Write(bound.ToString());
        WriteLine(" argument(s) to specified delegate.</summary>");
        WriteLine("/// <returns> A delegate that takes the rest of arguments of original delegate.</returns>");
        Write("public static ");
        WriteType(forAction);
        WriteGenericParameters(bound+1, total, forAction);
        Write(" Bind");
        WriteGenericParameters(1, total, forAction);
        Write("(");
        WriteFirstParameter(1, total, forAction);
        WriteParametersDeclaration(1, bound);
        WriteLine(")");
        WriteLine("{");
        PushIndent("  ");
        Write("return (");
        WriteParameters(bound+1, total);
        Write(") => ");
        Write("d");
        Write(".Invoke(");
        WriteParameters(1, total);
        WriteLine(");");
        PopIndent();
        WriteLine("}");
        WriteLine("");
      }
    }
  };
#>
using System;
using Xtensive.Core;

namespace Xtensive.Core.Helpers
{
  /// <summary>
  /// Extension methods for binding delegates to parameters.
  /// </summary>
  public static class DelegateBindExtensions
  {
<#
PushIndent("    ");
WriteMethods(false);
WriteMethods(true);
PopIndent();
#>
  }
}

Note that int NumberOfParameters = 9 there. That's because we declare additional  Action<...> and Func<...> delegate types in Xtensive.Core. Set it to 4 (or 5 - I don't remember) for .NET 3.5.

P.S. Likely, it's interesting why these methods have appeared. Denis added them while implementing fast expression compilation, later we used them at least in LINQ translator.

September 24, 2009

New feature of code.ormBattle.net

LOL, see http://code.ormbattle.net/#&&target=..%5cweb.config. So it seems we wrote a program on C# that prints itself :) Fortunately, no any passwords are reachable for the user it runs from.

To be fixed.

September 23, 2009

Object-to-object (O2O) mapper is our upcoming solution for POCO and DTO

I write this post mainly because I'm tired to listed complains related to necessity to support POCO and DTOs in ORM. Earlier I wrote this is not a real problem at all, if your ORM is capable of populating some objects. So here I'll simply prove this on examples.

So what object-to-object mapper (OOM) is? In the simplest case this is an API allowing to transform objects of types T1, T2, ... TN to objects of type T1`, T2`, ... TN` using pre-defined transformation rules (mappings). An example of such a simple API is e.g. AutoMapper (I recommend you to study its description before reading further).

On the other hand, I don't believe into such a simplicity ;) AutoMapper, as well as many similar mappers resolve just a single listed problem: forward-only mapping. I'd like such a tool handles few more cases:
  • It must be able to compare two T` graphs, identify the changes made there, and apply them to T graph. It must understand object keys, versions and removal flags while doing this.
  • It must be able to transform IQueryable<T> to IQueryable<T`>.
Let's think how we could work:
// Creating mapper. Let's think it is already configured.
var mapper = ...; 

// Transform a single object
var personDto = (PersonDto) 
  mapper.Transform(person); 

var personDtoClone = Cloner.Clone(personDto);
personDto.Name = "New name";

// Applying changes to the original object
mapper.Update(personDtoClone, personDto);

// Transforming the queryable
var personDtos = (IQueryable<PersonDto>)
  mapper.Transform(Query<Person>.All);

// Nothing has happened yet: we just provided a queryable
// that can be invoked later

// Here we're actually transforming the queryable to 
// the original one, executing it, transforming its
// result back to PersonDto objects and returning them.
var selectedPersons = (
  from p in personDtos
  where p.Name == "Alex"
  select p).ToList();

var selectedPersons2 = (
  from p in personDtos
  where p.Name == "Sergey"
  select p).ToList();

// Merging two lists of objects!
// This is possible, because we aware about keys.
// Conflicts are detected, because we aware about versions :)
selectedPersons = mapper.Merge(selectedPersons, selectedPersons2)

var selectedPersonsClone = Cloner.Clone(selectedPersons);
selectedPersons[0].Name = "Ivan";

// Applying changes to the original objects
mapper.Update(selectedPersonsClone, selectedPersons);
As you see, this solution allows us to solve the whole bunch of problems:
  • You can deal with POCO objects, your own DTOs - anything you want. There are no any special requirements.
  • This is simply ideal for SOA: Astoria (ADO.NET Data Services), .NET RIA Services, WCF, etc.
  • You may have as many of such mappings as you want. E.g. one per each particular client-side API ;)
  • You can use LINQ for your DTOs - that's simply a dream ;) Btw, writing such a translator must be really a peace of cake (there are always one-to-one mappings).
  • You shouldn't sacrifice all the benefits our Entity\Structure\EntitySet objects provide - I mean change tracking, lazy loading, auto transactions, validation, etc.!
Let's think how typical SOA context could look like:
public sealed class SoaContext : MappedStorageContext
{
  [MapTo(typeof(Person)]
  public IQueryable<PersonDto> Customers { get; }
  // We can implement it by standard way using PostSharp ;)

  [MapTo(typeof(Order)]
  public IQueryable<OrderDto> Orders { get; }

  public override void Initialize() 
  {
    // Executed just once per each type!
    
    // Only complex mappings are here.
    // 1-to-1 field mappings are defined automatically.
    // Type mappings are recognized from above properties.
    Map<Customer, CustomerDto>(
      c => c.Order.Total,
      dc => dc.OrderTotal);
    Map<Customer, CustomerDto>(
      c => c.Orders,
      dc => dc.Orders.Where(
        o => o.Date.Year==DateTime.Now.Year)
  }

  public SoaContext(Session session)
    : base(Session)
  {
  }
}

Note that this is EF-like context, that can be shared e.g. via ADO.NET Data Services API. Moreover, it provides Update & Merge methods allowing to update original objects or merge state changes - recursively.

I used MappedStorageContext here, which is "pre-tuned" for dealing with our own objects - e.g. it returns Query<T>.All for any auto property of mapped IQueryable<T> type and it is aware about Session. But it should be inherited from general MappedContext allowing you to map any objects as you like in similar fashion.

I hope this fully explains why I believe ORM must not care too much about supporting POCO and quite flexible mapping. These problems are solved by described OOM layer much better. Moreover, if they're resolved at different layer, the code of your Entities becomes more convenient and simple, because ORM can provide standard infrastructure for them.
  • In many cases (e.g. in web applications or simple services) you don't need POCO/DTO at all. All you need here is ability to deal with persistent entities using fast, simple and convenient API. Moreover, this API must be ideal for describing BLL rules. That's exactly what DO is designed for.
  • If you must maintain disconnected state for long-running transactions, upcoming DisconnectedState (and later - sync) will handle this gracefully. This problem significantly differs from DTOs - e.g. having on-demand downloading capability is quite desirable here.
  • Everything else (SOA, WCF serialization, etc.) is covered by this hard and fast solution.
Any comments are welcome. Especially if you see any problems here ;)

P.S. When this mapper will appear in DO4? Quite likely, we'll start working on it right after upcoming v4.1 update.

September 21, 2009

Should we ship a single assembly containing all the other ones - merged?

Today I was able to merge all the assemblies of DataObjects.Net v4.0 into a single one using ILMerge (later I'll share .targets file I created to invoke it): Xtensive.Storage.Merged.dll. Its size is almost 5 MB! So DO4 pretends to be largest ORM on .NET now ;) Actually, there are nearly 1.5 MB of external code - we integrated Npgsql.dll, Mono.Security.dll and Oracle.DataAccess.dll into it.

Anyway, the question is: would you like to have an option to use just this single assembly instead of a set of assemblies?

Pros:
- Single assembly instead of many ones. Btw, this is much less important now - recently I integrated automatic copying of all indirect dependencies to output folder into our .targets files.

Cons:
- It will be difficult to switch back to multiple assemblies later, if you'll start use this option in production. That's because serialized assembly name of any type inside merged assembly will be different from the original one. So if this path is chosen for a particular project, it will be hard to migrate back to multiple assemblies option.

Please tell me what do you think about this. Is this really necessary?

Offtopic: Celebrity Deathmatch

That's really fun :), although I'm not sure if the origin is Russian.There are:
- Police vs prisoner (nice, like action film of 80s)
- Tough guy vs crowd
- Britney Spears vs Madonna (nice)
- Lara Croft vs Indiana Jones (nice)
- Frankenstein vs D'Artanian
- Frodo vs Harry Potter
- J.C.V.D. vs Steven Seagal
- Godzilla vs Statue of Liberty
- Neo vs Skywalker
- King Leonidas vs Chuck Norris (nice).

September 20, 2009

Delegate.Invoke() vs using(...) {...} as prologue-epilogue implementation approach

We frequently use using(...) { ... } construction to wrap some code by a common prologue and epilogue actions. So why we prefer using? Let's list pros and cons of this on some examples.

Pros:

1. Performance. It appears that using requires additional allocation, but delegate invocation does not imply this. This is actually not true:
  • You can return the same (e.g. static) object from using construction. So prologue is actually a method call returning this object, and its disposer (IDisposable.Dispose) is epilogue. No allocations at all. Disposer will be called multiple times for the same object, but this is fully in accordance with IDisposable usage conventions.
  • You can return even null, if you don't need epilogue. Not obvious path (and we're going to get rid of all such cases in our code; we'll return static object instead), but it works.
Finally, delegates frequently involve closures. E.g. this code:
public double SillyPower(double x, int power)
{
  double power = 1;
  for (int i = 0; i < power; i++) {
    () => {
      power = power * x;
    }.Invoke();
  }
}
will be translated by C# compiler as nearly this one:
public double SillyPower(double x, int power)
{
  var _closure = new {
    power = 1d, 
    x = x, 
    handler = { power = power * x; } // Pseudo-code, 
    // actually you can't declare methods in anonymous types 
    // by this way. There is simply no way to do this at all ;)
    // But C# compiler can.
  };
  var _delegate = (Action()) _closure.handler;
  for (var i = 0; i < _closure.power; i++)
    _delegate.Invoke();
}
As you see, this code makes 2 allocations per each invocation of such method: the first one is closure instance allocation, and the second one is delegate allocation. Moreover, access to each variable placed into the closure from the SillyPower method scope requires additional reference traversal, because such variables are located in heap rather than on stack now.

If you're interested in details, read e.g. this article about closures and enumerators.

2. Better readability. Yes, I think version with using statement is more readable than in-place delegate creation.

Cons:

The only one I see is possibility to get original exception hidden. Let's have a look at this code:
public void Test()
{
  try {
    using (var s = SomeScope.Open()) {
      throw new FirstException();
    } // Imagine that SecondException is thrown in s.Dispose() here
  }
  catch (Exception e) {
    // e is SecondException here;
    // FirstException is completely lost.
  }
}
So here is the problem: if IDisposable.Dispose throws an exception, it has no opportunity to get information about any exception that is already falling through the stack. That's why it's recommended to avoid throwing any exceptions from Dispose method (moreover, you should do all you can to prevent the same from any code invoked by it).

But what if we need to throw an exception from Dispose? Is there a solution? Actually, yes. The most well-known one is to use Complete() - like method:
public void Test()
{
  using (var s = SomeScope.Open()) {
    if (new Random().Next(2)!=0)
      throw new FirstException();
    s.Complete(); // If this method isn't invoked, likely,
                  // above code has thrown an exception.
                  // So s.Dispose() should not throw an
                  // exception in any case.
    // Here we know that no exception was thrown in this block;
    // so it's safe to throw an exception from s.Dispose().
  }
}
Btw, inconsistency regions in DataObjects.Net v4.0.5 were vulnerable to this issue, but we've fixed it few weeks ago by exactly this way.

Note that there is no such problem with delegates: you can wrap delegate invocation into try-catch-finally block with ease.

So may be C# developers should think about providing ISafeDisposabe:
public interface ISafeDisposable : IDisposable
{
  void Dispose(Exception exception);
}
Really simple, yes? ;) Old IDisposable.Dispose must simply forward its call to ISafeDisposabe.Dispose(null), if ISafeDisposabe is implemented. Moreover, it's easy to make C# compiler implementing such a legacy Dispose automatically (like auto property). If ISafeDisposable is implemented, using statement must rely on it; otherwise it must rely on legacy IDisposable. So finally we have full backward compatibility.

Ok, that's just one of my dreams ;) Have a nice weekend!

September 19, 2009

Static field access performance: the answer

So many views, no answer... Guys, I don't believe this!

Ok, the latest case was the slowest one because JIT compiler substitutes any parameter of reference type to __Canon type during generic type instantiation. This allows to reuse the code generated for a particular generic type instance by other generic instances. So e.g. List<int> and List<long> won't share the same generated code, but List<string> and List<Array> will, because actually both of them will be implicitly transformed to List<__Canon>.

But why this affects on described case? Think how such static variable address is resolved in generated code.
  • In the first two cases (no T, or when T is value type) JIT compiler emits the code that is fully specific for this type. So it knows the exact address of static variable.
  • In the last case (T is reference type) JIT compiler emits the code that must work with any similar T substitution. So actually it can't put any exact address of static variable there. Instead, it resolves it via dictionary. The code it emits does the same as this one: __GetStaticFieldGetter(this.GetType()).Invoke(), where __GetStaticFieldGetter is internal method resolving this delegate via internal dictionary, and crating it, if this is necessary. Of course, actual code is much more efficient - e.g. it returns static field address instead of delegate, but the idea behind is the same.
Compare the cost of static variable lookup in generic type parameterized by reference type to e.g. [ThreadStatic] field access cost or to virtual generic method call cost - they are very similar. And it's fully clear, why: underlying logic in all these cases is almost identical. There is dictionary lookup.

Btw, likely, this case exposes the most severe impact on performance to which __Canon optimization leads. At least, I don't know any other case with the similar impact. So yes, there is always trade between memory consumption and performance :)

September 17, 2009

Static field access performance

I'm working on Tuples performance now. Few days ago Alexey Kochetov faced an issue with performance I'd like to describe here. The answer is almost obvious when you know how .NET deals with generics; otherwise this scenario appears a bit strange.

So what do you think... Is static field access operation provides constant performance?

Take a look at this code (open my original post to see syntax highlighting):
public class Host
{
  public static object StaticField;

  public virtual void GetStaticField(int iterationCount)
  {
    object o = null;
    for (int i = 0; i<iterationCount; i++)
      o = StaticField;
  }
}

public class Host<T> : Host
{
  // We're using different StaticField here
  public new static object StaticField;

  public override void GetStaticField(int iterationCount)
  {
    object o = null;
    for (int i = 0; i<iterationCount; i++)
      o = StaticField;
  }
}

Performance test for it:
int count = 1000000000;

Host host = new Host();
using (new Measurement("Static field of Host", MeasurementOptions.Log, count))
  host.GetStaticField(count);

host = new Host<int>();
using (new Measurement("Static field of Host<int>", MeasurementOptions.Log, count))
  host.GetStaticField(count);

host = new Host<Array>();
using (new Measurement("Static field of Host<Array>", MeasurementOptions.Log, count))
  host.GetStaticField(count);

Measurement class is IDisposable allowing to measure elapsed time and memory consumption between the moments it was created and disposed. Internally it relies on Stopwatch and GC.* methods.

Test output:
Measurement: Static field of Host:        Operations: 2,599   Billions/second.
Measurement: Static field of Host<int>:   Operations: 2,604   Billions/second.
Measurement: Static field of Host<Array>: Operations: 121,179 Millions/second.

Can you explain this?

If you'd like to see the answer right now, it is here.

September 16, 2009

Making MSBuild / Visual Studio to automatically copy all indirect dependencies to "bin" folder

Yesterday I asked this question on StackOverflow.com, and didn't get the answer I wanted. So is it possible to make MSBuild to automatically copy all indirect references (dependencies) to output folder?

Yes, this is possible, and the solution is provided below. But first let's think when this is desirable. Actually I hardly imagine why this does not always happen automatically. Really, if AssemblyA needs AssemblyB, and  my application needs AssemblyA, most likely, it won't work without AssemblyB as well. But as you know, AssemblyB won't be automatically copied to bin folder, if it isn't directly referenced from your project, that is actually a rare case, especially if you tend to use loosely coupled components.

Let's list few particular examples we have:

Case 1. Our SQL DOM project consists of core assembly (Xtensive.Sql) and a set of SQL DOM providers (Xtensive.Sql.Oracle, ...), and its quite desirable to copy all of them to application's bin folder, because generally it can use any provider. Let's think I created Xtensive.Sql.All assembly referencing all of them (btw, I really did this in our repository). Actually, this assembly contains a single type, which will never be instantiated:

  /// <summary>
  /// Does nothing, but references types from all SQL DOM assemblies.
  /// </summary>
  public sealed class Referencer
  {
    private Type[] types = new [] {
      typeof (Pair<>),
      typeof (SqlType),
      typeof (SqlServer.DriverFactory),
      typeof (PostgreSql.DriverFactory),
      typeof (Oracle.DriverFactory),
      typeof (VistaDb.DriverFactory),
    };

    // This is the only constructor. So you can't instantiate this type.
    private Referencer()
    {
    }
  }

As you see, this type references types from all SQL DOM assemblies (including its providers). This is necessary, because otherwise C# complier will not add references to these assemblies to Xtensive.Sql.All.dll, even although the project it is built by includes them.

So practically you can't use this type. But it makes C# compiler to list all the references we need Xtensive.Sql.All.dll assembly:



Note that each of these assemblies also needs many others. For example, let's take a look at Xtensive.Sql.PostgreSql.dll assembly there. It references Npgsql.dll, which in turn references Mono.Security.dll.

So now you understand the problem. I'd like all these assemblies to be in bin folder of my application automatically. I don't want to manually discover all the dependencies and write a code like this to copy them:

  <Target Name="AfterBuild" DependsOnTargets="RequiresPostSharp">
    <CreateItem Include="$(SolutionDir)\Lib\*.*">
      <Output TaskParameter="Include" ItemName="CopyFiles" />
    </CreateItem>
    <Copy SourceFiles="@(CopyFiles)" DestinationFolder="$(TargetDir)" SkipUnchangedFiles="true" />
  </Target>


Case 2. The same is about our Xtensive.Storage providers and assemblies. So I created Xtensive.Storage.All assembly referencing all you might need. This assembly contains very similar Referencer type.

Let's go to the solution now.

Solution: CopyIndirectDependencies.targets.

Here it is:

<?xml version="1.0" encoding="utf-8"?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <PropertyGroup>
    <CopyIndirectDependencies    
      Condition="'$(CopyIndirectDependencies)'==''">true</CopyIndirectDependencies>
    <CopyIndirectDependenciesPdb 
      Condition="'$(CopyIndirectDependenciesPdb)'==''">false</CopyIndirectDependenciesPdb>
    <CopyIndirectDependenciesXml 
      Condition="'$(CopyIndirectDependenciesXml)'==''">false</CopyIndirectDependenciesXml>
  </PropertyGroup>


  <!-- BuildXxx part -->

  <Target Name="CopyIndirectDependencies" 
          Condition="'$(CopyIndirectDependencies)'=='true'"
          DependsOnTargets="DetectIndirectDependencies">
    <Copy Condition="'%(IndirectDependency.FullPath)'!=''"
          SourceFiles="%(IndirectDependency.FullPath)" 
          DestinationFolder="$(OutputPath)" 
          SkipUnchangedFiles="true" >
      <Output TaskParameter="CopiedFiles" 
              ItemName="IndirectDependencyCopied" />
    </Copy>
    <Message Importance="low"
             Condition="'%(IndirectDependencyCopied.FullPath)'!='' 
               and '%(IndirectDependencyCopied.Extension)'!='.pdb' 
               and '%(IndirectDependencyCopied.Extension)'!='.xml'"
             Text="Indirect dependency copied: %(IndirectDependencyCopied.FullPath)" />
  </Target>

  <Target Name="DetectIndirectDependencies"
          DependsOnTargets="ResolveAssemblyReferences">
    
    <Message Importance="low"
             Text="Direct dependency: %(ReferencePath.Filename)%(ReferencePath.Extension)" />
    <Message Importance="low"
             Text="Indirect dependency: %(ReferenceDependencyPaths.Filename)%(ReferenceDependencyPaths.Extension)" />

    <!-- Creating indirect dependency list -->
    <CreateItem Include="%(ReferenceDependencyPaths.FullPath)" 
                Condition="'%(ReferenceDependencyPaths.CopyLocal)'=='true'">
      <Output TaskParameter="Include" 
              ItemName="_IndirectDependency"/>
    </CreateItem>
    <CreateItem Include="%(ReferenceDependencyPaths.RootDir)%(ReferenceDependencyPaths.Directory)%(ReferenceDependencyPaths.Filename).xml"
                Condition="'%(ReferenceDependencyPaths.CopyLocal)'=='true' and '$(CopyIndirectDependenciesXml)'=='true'">
      <Output TaskParameter="Include" 
              ItemName="_IndirectDependency"/>
    </CreateItem>
    <CreateItem Include="%(ReferenceDependencyPaths.RootDir)%(ReferenceDependencyPaths.Directory)%(ReferenceDependencyPaths.Filename).pdb"
                Condition="'%(ReferenceDependencyPaths.CopyLocal)'=='true' and '$(CopyIndirectDependenciesPdb)'=='true'">
      <Output TaskParameter="Include" 
              ItemName="_IndirectDependency"/>
    </CreateItem>

    <!-- Filtering indirect dependency list by existence -->
    <CreateItem Include="%(_IndirectDependency.FullPath)"
                Condition="Exists('%(_IndirectDependency.FullPath)')">
      <Output TaskParameter="Include" 
              ItemName="IndirectDependency"/>
    </CreateItem>

    <!-- Creating copied indirect dependency list -->
    <CreateItem Include="@(_IndirectDependency->'$(OutputPath)%(Filename)%(Extension)')">
      <Output TaskParameter="Include"
              ItemName="_ExistingIndirectDependency"/>
    </CreateItem>

    <!-- Filtering copied indirect dependency list by existence -->
    <CreateItem Include="%(_ExistingIndirectDependency.FullPath)"
                Condition="Exists('%(_ExistingIndirectDependency.FullPath)')">
      <Output TaskParameter="Include"
              ItemName="ExistingIndirectDependency"/>
    </CreateItem>

  </Target>


  <!-- Build sequence modification -->

  <PropertyGroup>
    <CoreBuildDependsOn>
      $(CoreBuildDependsOn);
      CopyIndirectDependencies
    </CoreBuildDependsOn>
  </PropertyGroup>
</Project>

Its intended usage: add a single highlighted line importing this file to any .csproj / .vbproj.

<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="3.5" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  ...

  <Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
  <Import Project="CopyIndirectDependencies.targets" />
  <!-- To modify your build process, add your task inside one of the targets below and uncomment it. 
       Other similar extension points exist, see Microsoft.Common.targets.
  <Target Name="BeforeBuild">
  </Target>
  <Target Name="AfterBuild">
  </Target>
  -->
</Project>

Check out demo application using this .target:
  • Download and extract it
  • Open it in Visual Studio and build it there, or build it by typing "msbuild" (of course, if it is in your PATH)
  • Check out "bin" folder. It already contains Mono.Security.dll from Lib folder, although application references just Npgsql.dll (it requires Mono.Security.dll).
If you'd like to suppress Visual Studio warning on opening such modified projects for the first time, see this article (in particular, "Non-standard Import Elements" section).

Update: initial version of CopyIndirectDependencies.targets published here was buggy, but now it's fixed.

September 14, 2009

NLog

About one week ago I added "Get rid of log4net dependency" issue - this must be a peace of cake, since we have logging abstraction layer. And today I occasionally found a good alternative to log4net: NLog. So likely, we'll switch to this logging library in observable future.

The most important question: why? Actually, I have a set of reasons for this:

  • I dislike ports (in particular, from Java). Even the best ones frequently utilize techniques, that aren't friendly to .NET. Moreover, frequently they don't utilize features of new platform (e.g. delegates, generics, events and so on, speaking about Java ports) in their API (that's the worst case) or internals. Of course, many of such porting gaps gets fixed during some reasonable period (as it was with NHibernate - nowdays it looks pretty "native" for .NET), but some of them stays there anyway.
  • I immediately liked its web site and description pages there. Everything seems fully clear. I'll report on issues when we'll study it closer.
  • I always like to know who's responsible for the framework. And "Copyright © 2004-2009 by Jaroslaw Kowalski" is much more meaningful for me than a set of active committers and community contributors.

September 11, 2009

Implementing LINQ in ORM, part 1. No SQL DOM = no LINQ.

When we launched ORMBattle.NET, I got few complains like this one. And it's true - for now we almost never explained all the details of our LINQ translation layer. There are two reasons of this:
  • Until the end if June we've been writing the code. It was clear we have no time to waste on blogging about this. In fact, we wanted to get fully functional LINQ provider ASAP. And only after this we could talk about it.
  • What we were doing was actually much more complex than what we've seen so far. IQToolkit, series of Frans Bouma posts and everything else we could find is actually much simpler than what we developed. So it was clear that a description of this will be really large.
But now I have some time to describe our LINQ implementation. The description won't be complete - I will try to cover only major aspects. Moreover, frequently I won't describe the solution completely, but instead I will try to describe the problem it is related to. It's much easier do develop a solution if you know the problem. So let's start. My today's post is devoted to SQL DOM. I'm going to cover the following questions:
  • What is SQL DOM?
  • Is it a "must have" part of any ORM that is going to support LINQ?
  • What are alternatives to SQL DOM?
  • Why we developed it? When we started, is was completely unclear if we will support LINQ or not.
  • What about NHibernate or Subsonic? They have LINQ providers, but don't have SQL DOM.
What is SQL DOM?

SQL DOM is object oriented model of SQL language, as well as a set of providers allowing to compile these models to actual SQL commands (nearly, text + parameters). If you're interested how it looks like, visit Xtensive.Sql.Dml namespace. Our implementation of SQL DOM is responsible for schema extraction and DDL support as well. So actually it is almost complete DOM of SQL language based on SQL:1999 standard.

Here is an example code relying on SQL DOM (if it isn't colorized, visit original post - I use SyntaxHighlighter, that doesn't work via RSS):

[Test]
public void TableAutoAliasTest()
{
  SqlTableRef tr1 = SqlDml.TableRef(Catalog.Schemas["Person"].Tables["Contact"], "a");
  SqlTableRef tr2 = SqlDml.TableRef(Catalog.Schemas["Person"].Tables["Contact"], "a");

  SqlSelect select = SqlDml.Select(tr1.CrossJoin(tr2));
  select.Limit = 10;
  select.Columns.AddRange(tr1[0], tr1[0], tr2[0]);
  select.Where = tr1[0]>1 && tr2[0]>1;

  sqlCommand.CommandText = sqlDriver.Compile(select).GetCommandText();
  sqlCommand.Prepare();
  Console.WriteLine(sqlCommand.CommandText);
  GetExecuteDataReaderResult(sqlCommand);
}

Is it a "must have" part of any ORM that is going to support LINQ?

I think, yes. First of all, few evidences:

But how SQL DOM helps here? Actually, it allows LINQ translator to be independent of underlying database. LINQ translation architecture with SQL DOM looks nearly the following:
  • LINQ translator produces "unified" SQL DOM model for each LINQ query.
  • Unified SQL DOM model is processed by a set of provider dependent visitors, that rewrite unsupported SQL constructions there to supported ones, as well as simplify (beautify) resulting SQL DOM model.
  • Finally, this model is converted to SQL command.
Can something similar be done without SQL DOM? Yes, but in fact it will lead to usage of solution similar to SQL DOM. Let me demonstrate a single method from our LINQ translation layer:

protected static bool ShouldUseQueryReference(
  CompilableProvider origin, 
  SqlProvider compiledSource)
{
  var sourceSelect = compiledSource.Request.SelectStatement;
  var calculatedColumnIndexes = sourceSelect.Columns
    .Select((c, i) => IsCalculatedColumn(c) ? i : -1)
    .Where(i => i >= 0)
    .ToList();
  var containsCalculatedColumns = 
    calculatedColumnIndexes.Count > 0;
  var pagingIsUsed = sourceSelect.Limit != 0 || 
    sourceSelect.Offset != 0;
  var groupByIsUsed = sourceSelect.GroupBy.Count > 0;
  var distinctIsUsed = sourceSelect.Distinct;
  var filterIsUsed = !sourceSelect.Where.IsNullReference();
  var columnCountIsNotSame = sourceSelect.From.Columns.Count !=
    sourceSelect.Columns.Count;
      
  if (origin.Type == ProviderType.Filter) {
    var filterProvider = (FilterProvider)origin;
    var usedColumnIndexes = new TupleAccessGatherer()
      .Gather(filterProvider.Predicate.Body);
    return pagingIsUsed || 
      usedColumnIndexes.Any(calculatedColumnIndexes.Contains);
  }

  if (origin.Type == ProviderType.Select) {
    var selectProvider = (SelectProvider)origin;
    return containsCalculatedColumns && 
      !calculatedColumnIndexes.All(
        ci => selectProvider.ColumnIndexes.Contains(ci));
  }

  ...

  return 
    containsCalculatedColumns 
    || distinctIsUsed 
    || pagingIsUsed 
    || groupByIsUsed;
}

As you see, we're studying the content of previously produced SQL DOM expression to make some decision, that will actually affect on production of subsequent expression. Actually, this code helps to decide if we must create a new query with nested select statement, or simply decorate the statement we already have. And to make this decision, we must know how existing statement looks like.

Note that such a solution isn't 100% necessary. E.g. you can decide to rely on just the first case (create subquery) . But this doesn't always work. For example, it's quite desirable to reduce the statement nesting level in Oracle (see the tail of this post for details).

Finally, the SQL DOM tree transform I just described isn't the only one. There are many others, e.g. you must implement APPLY rewriter. Btw, in our case it is implemented on RSE level (i.e. we transform query plan rather then SQL DOM model), but if you don't have such an intermediate layer, likely, it will appear either as rewriter in LINQ translator layer, or as SQL DOM rewriter.

What are alternatives to SQL DOM?

As I just mentioned, the simplest alternative coming to mind is custom (internal) LINQ extensions allowing to precisely describe SQL language constructions, e.g. as it's described here. So such a translation pipeline would look like:
  • LINQ translator produces translates original LINQ expression to SQL-like expressions
  • This expression is processed by a set of provider dependent visitors, that rewrite unsupported SQL constructions there to supported ones, as well as simplify (beautify) resulting expression.
  • Finally, this expression is converted to SQL command.
But as you may find, this is nearly the same as in case with SQL DOM. Let's list pros and cons: Pros:
  • No necessity to develop large set of classes. 
  • Likely, it will allow to develop a complete solution faster.
Cons:
  • Expressions in .NET are of predefined types. You can't associate custom information with them. But in this case you need the information related to SQL, not to C# expressions. E.g. to decide if this SELECT statement has WHERE clause, you must check if method call expression (e.g. calling .Select) has a parameter named "where". So "studying" such expressions must be a hell: you must be aware what can be used there, how to check if a particular construction is used and so on. Remember that you can't use your custom types in such models (if you will, in fact, you will develop SQL DOM ;) ).
  • You will anyway need schema model.
  • So likely, this will slow down the development of anything related to it. Rewriters, visitors (compilers) and so on.
  • Moreover, this will slow down the translation. Expressions are immutable, and their construction is quite slow (at least now).
Why we developed it? When we started, is was completely unclear if we will support LINQ or not.

This decision was based on our previous experience. DataObjects.Net v1.X ... 3.X never had this part, and finally we came to a conclusion we anyway need it. As you may find, it isn't related to LINQ much. It is an abstraction allowing to deal with SQL in object oriented fashion, rather with SQL as text. And this ability brings tons of benefits - in particular, it makes SQL generation in ORM quite unified.

SQL DOM was actually the first part we developed for DataObjects.Net 4. Btw, we even sold few copies of it :) (obviously, we knew it will be almost impossible: such a monster is necessary mainly for ORM developers). Here is a testimonial we recently got for it (translated from Russian):

"In one of our project we faced necessity to build complex SQL queries independently from specific DBMS. It was in 2007. It was clear that we need some object model of SQL the language. Searching the market then led us to Xtensive SQL DOM. It was not a product - it was a library that existed in this company and was used for their own needs. However, they wrote about it on their support forum. It should be mentioned the openness of the company was surprisingly high, which is always nice. We acquired this library and are successfully using it for two years now. Despite the fact that we have implemented some modifications there, I can highlight high quality of its design and code . Its architecture allowed us to implement a new SQL DOM provider for Oracle for it solely by our own forces. Now SQL DOM is included into DataObjects.Net 4. Definitely, any product of this company deserves the closest attention.

Sincerely, Sergey Dorogin.
Lead developer,
Department of Information Technology, JSC "CROC Incorporated"

CROC is the leading Russian company in IT infrastructure creation (IDC "Russia IT Services Forecast and Analysis" reports, 2002 - 2008). CROC helps customers improve business efficiency and meet their strategic goals through the advanced use of information technology.

The Oracle provider for DataObjects.Net 4 we develop now is based on SQL DOM Oracle provider code we received from Sergey.

What about NHibernate or Subsonic? They have LINQ providers, but don't have SQL DOM.

This means they either perform very limited set of transformations with SQL or don't perform them at all. And until this is done, LINQ implementation they provide will be quite limited.

Speaking about NHibernate, As far as I can judge, the team (or person) working on its LINQ translator  is going to translate LINQ to AST trees of HQL. Thus the very first thing they started to work on is implementation of HQL parser based on standard grammar parser (ANTLR). But for me this is a secondary problem. It is absolutely unclear how this will help to resolve the main issues:
  • LINQ is much more complex and rich than HQL. So to fully support LINQ, HQL must be extended to support all of its features. Obviously, this isn't easy even from the point of syntax.
  • Moreover, compilation of such HQL won't be as simple as now. Now NHibernate providers build SQL by contactenating strings, but they'll need nearly the same infrastructure as I just described.
So actually, I don't see how NHibernate team is going to deliver LINQ in some reasonable time at this point. They make some steps around the same point, but don't move further.

Ok, that's just what I see. I could make a mistake: my opinion is based only on blog posts, but not on the source code. So don't consider this as 100% fact. I'll be glad to hear anything showing that actual situation is different.

September 10, 2009

Who am I?

I am the original author of DataObjects.Net – one of the oldest ORM frameworks for .NET. Currently I’m the architect of its 4th version, which is a completely new product.

Besides that, I’m CEO at X-tensive.com – the company that stands behind DataObjects.Net, and that initially appeared because of this product.

I maintain a set of blogs related to my professional interests:
Finally, I'm one of leaders of Urals .NET User Group (UNetUG). Almost all group meetings and talks there are organized with my involvement.

That was a part related to my current professional interests. To make this post more complete, I'll mention few more facts about my personal life:
  • I'm 29. 22nd of December, 1979 is my birthday.
  • I'm married for 8 years, now we have two sons (Roman and Eugene).
  • Periodically I become a movie fan ;) I'm tracking all the new ones trying to watch the best. If the time permits, I write short reviews.
  • I like sport, but I was ignoring it at all during last 6 months (thank you, DO4 :) ). Really pity. It is boxing (obviously, not professional), snowboard, swimming (again, non professional, but 5km distance is ok for me) and regular fitness.
  • I do not smoke, although I was smoking for about 5 years starting from late school. I almost don't drink as well (~ few times per year).
Hmm... I noticed that above 5 lines looks more like offtopic here ;) Likely, it will be more interesting if I'll tell you about this:

How I started to write programs

I wrote my first program nearly at 10. My first programming device was Elektronika MK-61 – a Soviet programming calculator, LOL ;) The first program I wrote was a simple calendar (DD.MM.YYYY -> day of the week). The author of this task was my father (Eugene Yakunin) – he was vice-CEO at Elecond that time (this big plant is still slowly dying, although almost 20 years has passed from Soviet Union times), and an engineer by education. He never tried to program himself, but did a lot to help me with choosing the direction to go.

My very first PC was ZX Spectrum - I was 12 when I got it. It was really a progressive one in Soviet Union that time. My first programs there were written on BASIC; I remember I wrote a simple star ship shooting game, where the star ship was actually a triangle. At 13 I was programming pretty well on Z80 assembler - of course, for my age ;). I still have some copies of these programs. One of the most practically interesting ones was vector font editor written mainly in assembler.

At 15 I switched to PC/DOS. It was a PC with i486 processor and 4 MB RAM. I started from Turbo Pascal there, but almost immediately switched to Zortech C++, and later - Watcom C++. I knew about it because of Doom game, that was one of the first games working in i386 protected mode, and its i386 protected mode stub was printing the name of its compiler. Btw, the compiler was really good. I remember I dreamed about it on programming contests - RAM limit of Borland C++ compilers was driving me crazy even although I knew it's enough to solve the problem ;)

I should add C++ was the first language I found really beautiful. Not C, but C++ - its object-oriented nature was looking quite close to the way I wanted to think about programs.

While studying at school, I was programming for fun. It was mainly a stuff related to 3D rendering - I dreamed to develop a 3D game that time ;) And it was really fun. Here are some screenshots of this "software":

Miscellaneous (see alt. text):
Realtime Julia set computation. Assembler, ~ 500 bytes. Raytracing. Almost realtime now ;)
Realtime electromagnetic field computation

Simple 3D modeler:
Simple 3D modeler Simple 3D modeler
Simple 3D modeler

Game prototype. Realtime Gouraud shading :) Never finished.
Game prototype. Realtime Gouraud shading :) Never finished. Game prototype. Realtime Gouraud shading :) Never finished.

Realtime voxel landscape rendering, v1.0:
Realtime voxel landscape rendering, v1.0 Realtime voxel landscape rendering, v1.0 Realtime voxel landscape rendering, v1.0

Realtime voxel landscape rendering, v2.0. This was the last work I started in school; it was finished on the first course of university:
Realtime voxel landscape rendering, v2.0
Realtime voxel landscape rendering, v2.0

You may find (by studying the labels on above pictures) I thought about my own company starting from early teen ages. I tried to choose the pictures without labels, but it was completely impossible to get rid of them from few ones ;)

The same time I was participating in programming contests - pretty successfully. 2nd place on regional contest (Sverdlovskaya oblast, or "Yekaterinburg region") was my best result in school.

So programming is what I really enjoyed that time. I mostly tried to develop a real time solutions and considered everything else as boring. Studying the last year in school, I knew linear algebra well enough to understand many computational algorithms related to 3D rendering - e.g. depth sorting, binary space partitioning and radiosity (I didn't implement it, but always wanted to find some time for this). Remember that until 3Dfx came to a scene, 3D rendering was solely your own problem, moreover, there were no Z-buffers and only 256 colors. So it was definitely not a peace of cake ;)

After school I started to study theoretical physics in Urals State University. "Why theoretical physics?" - you could ask. That's because I was good in physics as well, and my father finally said me: "You must try yourselve here as well - its clear that programming will anyway be in the list of your competences, so you will be able to further study it in your free time".

And I was really interested in studying physics during first 3 years. As well as programming - during this period I mainly played Delphi and C++ Builder. That's why I liked .NET/C# later - both Delphi and C# are architected by the same person (Anders Hejlsberg). During the same period I started to use databases. BDE and MySQL were the first ones I tried, although further I was focused mainly on Microsoft SQL Server.

I used C++ Builder to solve even computational problems (I used Mathematica as well, but not always). Here are screenshots from a program I wrote for one of my course works:
C++ Builder is used in my course work C++ Builder is used in my course work
C++ Builder is used in my course work C++ Builder is used in my course work

The graphics you see was rendered by my own charting control. You could move & zoom the chart there with the mouse. Labels on axes were carefully placed to avoid any overlapping. But its most attractive feature was real-time rendering with smooth chart edges. Note that smooth rendering was not supported by GDI that time. I achieved this because I actually used OpenGL to render the chart ;)

Starting from the 4th year of my study in USU physics became a hell for me. I was coming to a conclusion my brain simply isn't tuned up for true theoretical physics at all. Probably, that's because I was pretty rare guest at lectures, so it was harder and harder for me to prepare myself to each subsequent examination. But I think there was much more important reason: I clearly seen I will never be among the best students in physics. And on contrary, I can be the best in programming.

So that was a year when my real-life programming career has started. You can get some imagination of what I was doing later by visiting my LinkedIn page, so I won't tell much about this now. Or that's the story for another time ;)

The list of languages and compilers I had practical experience with during my school and early student years includes:
  • Assembler: - I suspect, at least 5-7 implementations for Z80 and Intel CPUs, including the one built-in into Watcom C++ compiler and Microsoft MASM
  • Basic: that was my first real programming language on ZX Spectrum. Later I tried a set of its compilers there, including Tobos FP. When I switched to IBM PCs, I also studied several ones, including QuickBASIC and VBA.
  • Pascal: Turbo Pascal, and later - Delphi. As far as I remember, the last version I used was Delphi 6.
  • C++: I used many of its compilers - at least, Borland C++, Zortech C++, Watcom C/C++, GNU C++, C++ Builder, and Visual C++.
  • Java: I just started to play with it that time. As you might remember, it was initially designed for browser-based applets. That's why initially I didn't like it much - it seems I looked it up too early. Later Sun realized its true potential and brought it to enterprise software developers. But when I got a chance to look it up again, .NET was upcoming.
  • Scripting languages: .bat files, JavaScript, VBScript and PHP were my favorite ones that time.
Ok, my first true post here is coming to an end. If you read this sentence, I hope you aren't disappointed. It uncovers some aspects of my nature: I suspect only few of our employees know I was 3D programming fan in the past, or that my first programming device was Soviet programmable calculator ;)

Partially this explains why I pay a lot of attention both to an architecture and a performance of solutions we develop. I fully aware that the optimization must never follow before the implementation, and follow this rule very strictly. On the other hand, I never allow to implement a solution utilizing a completely wrong algorithm in case when the performance is potentially important. So you can be 99.9% sure you won't discover we use an algorithm with O(N) complexity instead of O(log(N)) in any code that have even a tiny chance to be a bottleneck in some real-life conditions. So we're choosing the solution for each particular case very carefully.

P.S. In future I'll try my best to ensure this blog worth reading. I'll be glad to see you next time here.