A Little on LINQ
Recently I was involved (only a little) in preparing some of the Visual Studio Launch content for the February 27th launch event (and subsequent worldwide events). During the preparation we held "train-the-trainer" meetings where people who would be presenting launch sessions around the world could learn more about the sessions and ask questions. During one of these TTT events a question was raised about a LINQ demo that was written for a session named "Breakthrough Software Development Challenges with Visual Studio 2008".
The Question
Is there a performance impact to joining result sets in LINQ from two different data sources?
This question came up because this is exactly one of the scenarios we demo. Here is my rendition of the demo.
The Answer
I talked to the language teams and this is the truth of the situation.
For cross-domain joins you typically have the choice between really lousy performance or a runtime error – and generally you should prefer the latter!
The original source of the query gets to decide how the query gets executed. If the source is a LINQ to SQL table, for instance, it will simply look at anything it gets joined with, and throw an error if that is not another LINQ to SQL table, entity collection or query result.
If you go the other way, however, and join an in-memory collection (such as the descendents of an XElement, as in our demo) with a LINQ to SQL table or query result, then you are in trouble: It will enumerate the table into memory and do the join on your machine (Yikes!).
The general guidance is: don’t do cross domain joins!
Let me qualify a bit: I am talking specifically about a “Join” operation on large data sources in different domains. There are several things you can still do to work sensibly across domains, such as:
- Query the remote (e.g. LINQ to SQL) data first to yield a small enough result that it makes sense to pull it down and join locally
- Use the Contains() method on LINQ to SQL data, which will take small local collections and actually send them to the db as an IN expression in the generated query
The upshot is that one still has to be aware of the cross-domain joining and manage its impact as an application programmer.
D7

Comments