EF4 Include method returns different results than EF1 Include
-
Monday, February 15, 2010 11:48 PMEF4 Include method returns different results than EF1 Include
While converting a Winform application to WPF, I was creating a telephone directory-like report using WPF's native printing to replace one made with a 3rd-party report engine. The new report was shorter than the old one and the problem was clear, I was not getting the complete set of data back that I was expecting.
EF v1 setup: Visual Studio 2008, .Net 3.5 SP1, EF1, VB.Net, SQL Server 2008
Sample project: EF1_Sample_Project
EF v4 setup: Visual Studio 2010 RC1, .Net 4.0, EF4 (lazy loading OFF), VB.Net, SQL Server 2008
Sample project: EF4_Sample_Project
Database schema:
Account table
AccountPerson table (payload m2m link)
Person table
PersonTelephoneNumber table (m2m link)
TelephoneNumber table
In EF1 if I wanted to get a list of all Accounts, including associated people and their telephone numbers I would use something like:
I would get a list of all accounts and their people, regardless if the person had any telephone numbers or not. However, if I use the same query in EF4, an account's person will not be returned if they do not have any telephone numbers.Dim accountList = entities.Accounts.Include("AccountPeople.Person.TelephoneNumbers")
Download each of the samples above to see what I mean.
In each console application, the addition of person #2's telephone numbers are commented out. EF1 will return person #2 with the account, but EF4 will not. If you uncomment the lines and add some telephone numbers to person #2, both EF1 and EF4 will return all persons and telephone numbers.
Does EF4 require something different that is just not obvious to me?
Thank you,
George- Edited by George Fitch Tuesday, February 16, 2010 8:22 PM Update details
Answers
-
Wednesday, February 17, 2010 3:19 PM
George,
I think it's because earlier outer joins were used, now inner joins. If you want, you can obviously verify it by using the Sql profiler or letting the Sql code generated to be logged and study it by yourself.
If you've followed the discussions you could see significant efforts have been spent in using inner joins instead of outer joins and generally produce queries which are more efficient to execute.
I have not myself read that the semantics of Include has been altered, however.
In my case I don't have any need for the semantics of the outer join so if something should be done I would rather say that Include should be able to take an argument like inner/outer, concerning what you would prefer. In this way we could probably all be satisfied. In this way you could write e.g. Include("AccountPeople.Person", JoinKind.Outer).Include("AccountPeople.Person.TelephoneNumbers", JoinKind.Inner). I would dare to say that JoinKind.Inner should be used if nothing would be specified, either by the semantics of a default value or having two overloads, one with/one without JoinKind.
Best regards,
Henrik Dahl- Marked As Answer by George Fitch Thursday, February 18, 2010 4:18 AM
All Replies
-
Tuesday, February 16, 2010 8:09 PMGeorge,
Now, I haven't tried it myself, but have you tried to do Dim accountList = entities.Accounts.Include("AccountPeople.Person").Include("AccountPeople.Person.TelephoneNumbers")?
Best regards,
Henrik Dahl -
Tuesday, February 16, 2010 8:26 PMHenrik,
Thanks for the reply.
Yes, I did try that. I even tried reversing AccountPeople.Person and AccountPeople.Person.TelephoneNumbers. I just tried it again a minute ago to make sure I didn't forget something. Still doesn't work.
Wierd... -
Wednesday, February 17, 2010 9:57 AM
Hello George,
OK, so what about:
Dim irrelevantState = entities.Accounts.Include("AccountPeople.Person.TelephoneNumbers").ToList()
Dim accountList = entities.Accounts.Include("AccountPeople.Person")
Best regards,
Henrik Dahl -
Wednesday, February 17, 2010 2:24 PMHenrik,
Yes, that does work, but it also increases the query time by 50%. At the most I would consider this a workaround.
I really can't understand why this worked in EF1 but not EF4. Maybe I should submit this to Microsoft Connect as a possible bug? -
Wednesday, February 17, 2010 3:19 PM
George,
I think it's because earlier outer joins were used, now inner joins. If you want, you can obviously verify it by using the Sql profiler or letting the Sql code generated to be logged and study it by yourself.
If you've followed the discussions you could see significant efforts have been spent in using inner joins instead of outer joins and generally produce queries which are more efficient to execute.
I have not myself read that the semantics of Include has been altered, however.
In my case I don't have any need for the semantics of the outer join so if something should be done I would rather say that Include should be able to take an argument like inner/outer, concerning what you would prefer. In this way we could probably all be satisfied. In this way you could write e.g. Include("AccountPeople.Person", JoinKind.Outer).Include("AccountPeople.Person.TelephoneNumbers", JoinKind.Inner). I would dare to say that JoinKind.Inner should be used if nothing would be specified, either by the semantics of a default value or having two overloads, one with/one without JoinKind.
Best regards,
Henrik Dahl- Marked As Answer by George Fitch Thursday, February 18, 2010 4:18 AM
-
Thursday, February 18, 2010 4:18 AMHenrik,
I just reviewed the generated SQL of EF1 and EF4 with using the ToTraceString method, and you are correct.
EF1 used 3 outer and 1 inner joins, while EF4 used 1 outer and 3 inner joins.
I'm not very happy with this change, but if it was a result of community requests then I guess there is not much I can do.
Damn you EF4. I guess I'm looking at doing this with JOINs... now to figure that out.
Thanks Henrik. -
Thursday, February 18, 2010 4:44 PMI highlighted this thread to Julie Lerman who thought it was interesting / important enough to write a blog post on
Well spotted George.
-
Thursday, February 18, 2010 9:02 PMDave,
Thank you for alerting Julie Lerman to this issue.
Glad to hear I'm not the only one that thinks this could be a problem for some developers.
For what it is worth, I submitted a suggestion through Microsoft Connect to possibly add Henrik Dahl's suggestion of adding an argument to the Include method to allow EF4 to mimic EF1's Include behavior if needed. We'll see...
https://connect.microsoft.com/VisualStudio/feedback/details/534675/ef4-include-method-returns-different-results-than-ef1-include
I've been trying some other methods to get this to work, such as projection, but they are even slower than Henrik's workaround and require even more code. I find it irritating that something that used to be fast and required only a single line of code is now more complicated and slower.
I would ask other developers who see this as an issue to please leave a message here or at the above Microsoft Connect issue page (and vote it up). -
Saturday, February 20, 2010 9:58 PMHello George!
Thank you very much for your positive opinions on what I wrote. As I can see there's a chance this will develop into something serious I would like to add a few words, to sharpen the theoretical aspects of computer science within, so to speak.
It's obvious that the underlying abstraction level, the SQL Server, implements the relational algebra. It's also obvious that the ADO.NET Entity Framework is a frontend or we could say denotes an abstraction over this device implementing the relational algebra. As the underlying device implements the relational algebra it's crucial to have support for the semantics of both outer and inner joins because otherwise subsets of the relational algebra remain unsupported and therefore it can be deterministically inferred that there are problems which the ADO.NET Entity Framework does not span for.
Due to the facts above it's obvious that if you have .Include("A.B.C") as an example, you need to be able to specify for each of A, B and C whether you want the inner or the outer variant of the relational algebra to be in power. It would be appropriate if inner would be default, i.e. if no explicit variant is specified.
I hope you don't think I'm too rude, but honestly, this should be quite obvious to the designer of Include(<path>), shouldn't it?
Best regards,
Henrik Dahl -
Sunday, February 21, 2010 2:18 AMHenrik,
No, I don't think your statement is rude at all.
I've been giving this issue a lot of thought over the last few days. I'm more annoyed now than ever.
I understand why EF4's Include method behaves the way it does (inner vs. outer joins), but I just can't understand why Microsoft changed it to only act this way. Every tutorial, help code snippet or sample code provided in forum posts and blogs that I've run across show using the Include method as an easy and efficient way of getting all of your data. Now, all of that guidance seems to only apply to some scenarios.
Julie Lerman described this issue in her blog as only affecting 'Edge cases', but I think she may be underestimating how many applications will be affected. The application I'm converting to WPF is no 'Edge case' app in my opinion. The database structure follows a lot of normalization conventions you would see elsewhere. My knowledge and understanding of Entity Framework is nowhere near that of someone like Julie Lerman. All I know is that everything I've read tells me to use the Include method when I want to include other related objects in the query. Now it just feels less useful for a highly normalized database schema.
I think you were correct in saying that Microsoft should have provided a means of telling EF Include to use inner or outer joins in the SQL it generates. After your comments and some thought I now believe this functionality should have been provided from the beginning with EF1. Make it possible to really shape the data that is 'Included'. But why change EF4's Include method to the complete opposite of what the previous version did, while not offering the previous version's functionality? I think this is going to affect a lot more developers than Microsoft thinks it will.
I'm afraid this late in the production cycle nothing may change. -
Sunday, February 21, 2010 9:58 PMGeorge,
There's a good design principle within the field of computer science which is: Do not impose business abstraction level decisions on the user but support him in efficiently executing his business decisions effectively. The typical consequence of violating this design principle is that the user can not use the tool for his task because he basically gets imposed a solution for another kind of problem than the problem he actually has to solve.
In your case this value is being challenged. You have a clear vision on which problem you want to be solved from business point of view. Your decision is perfectly valid as it's clearly in line with relational algebra.
The Include() method is imposing a business level abstraction decision on you, i.e. it imposes inner joins to be used.
Therefore the current situation is basically a school example of what happens when the value I wrote in the beginning is being violated.
You can observe, that my suggested approach fulfills the mentioned design principle. This is due to the fact that I've designed my suggestion in recognition of this design principle. The same sound approach could be good to the ones who design the ADO.NET Framework. In order to do that you however need to be quite strong in computer science and not just in some Sql or whatever tool smith stuff because a fool with a tool is still a fool. There's a difference between butchers and doctors or craftmen and engineering architects and this is also to be seen here.
Best regards,
Henrik Dahl -
Monday, February 22, 2010 9:25 PMMicrosoft left a comment on my Connect suggestion page today about enabling EF1's Include method behavior:
https://connect.microsoft.com/VisualStudio/feedback/details/534675/ef4-include-method-returns-different-results-than-ef1-include (This is the same link as the above post)
While not promising anything this late in the release cycle, it does sound encouraging. -
Tuesday, February 23, 2010 1:06 PMGeorge,
OK. I must say that I don't think it should be simply reverted to the old approach. In many cases this will cause an enoumously excessive amount of data to be transferred from the Sql server to the client. It's obvious that an inner join is what is needed in many cases. It's also obvious, like in yours, that an outer join is required for some cases.
An alternative idea could be Include("A{InnerJoin}.B{OuterJoin}", i.e. to basically attribute the edge. InnerJoin should be default if nothing is specified.
George, have you actually tried this in a situation where you have quite much data? I think it looks like the construction takes enormously long time to execute when you have quite much data. I must say I have a suspicion that it's not particularly efficiently implemented how all the received data is being connected together.
Best regards,
Henrik Dahl -
Tuesday, February 23, 2010 4:26 PM
Henrik,
Thank you. For some reason I read their comment as meaning 'adding the EF1 behavior back in' and not 'changing it back'. I left a comment on the Connect suggestion page asking for clarification.
I hope they don't revert it back, but add the EF1 behavior as an option. Your alternative idea looks like an ultimate solution for this problem, although implementing it this late in the game is probably not going to happen. This close to RTM may prevent them from changing anything and may be their short-term decision on this matter. Some addtional time and thought may cause them to come to the conclusion to change nothing for now, but implement a better solution for SP1, or, change it back to EF1's behavior now and implement a better solution for SP1. I hope it is not the second one because a lot of developers such as yourself obviously want the inner join Include method.
I could wait for a future update to this issue although I would not be happy about it. Your 'dummy query' workaround is what I am using now with a TODO comment around it to watch for updates from Microsoft. Not ideal, but it works for now. As you said previously, a lot of effort was put into optimizing things with inner joins because of the efficiency increase. I realize this is what the community wants, including myself, and reverting back would not be the answer.
In reference to your question about data quantity and query construction time, yes I realize this. The EF1 Include method has the potential of returning a lot of unwanted data which is why you should not abuse it. For my needs in this telephone directory report I know of no other way of getting all of this data without making the query time astronomical. The WPF print engine is very slick and I can't believe how fast it can generate this complicated report that takes 4-5 times longer with my current 3rd-party report engine. The WPF version of the report is generated in under 1 second. The query returning 600+ accounts, 600+ addresses, 1000+ people and 1700+ telephone numbers takes an addtional 1.5 seconds with your workaround, up from just under 1 second, and all with a single line of code. And yes, I fully realize that I'm complaining about 1/2 second of addtional query time in this case, but that's not the point. The EF1 Include method already worked beautifully in this case. Some other methods I tried of getting the data required 6-12 seconds and required much more code. I just can't see using a more complicated solution to retrieve the data that requires more time than is required to generate the report itself. Yes, the outer join Include method is a hog, but what is the alternative?
Anyway, I hope Microsoft makes the right decision on this matter. -
Wednesday, February 24, 2010 9:07 PMMicrosoft left another comment on my Connect suggestion page today about this issue.
It seems the new behavior was not intentional, but rather a side affect of a bug in the code used to generate SQL code needed for the Include method. They say this issue should be resolved in time for .Net 4 RTM. It appears the fix will not be to introduce an optional argument as per Henrik Dahl's suggestion, but to merely return the Include method to the previous behavior while automatically optimizing when and how inner and outer joins are used.
Let's hope this doesn't introduce any unwanted behaviors. :) -
Thursday, February 25, 2010 3:59 PMHello George,
Yes, I noticed it yesterday.
What they write could basically be understood as we can deterministically determine what you actually want, inner or outer, and apply exactly the right approach. In this way you get the most effective and at the same time the most efficient approach relative to your business needs. I must say to me this nearly sounds too good to be true.
Perhaps they do something corresponding to Include("AccountPeople") union Include("AccountPeople.Person") union Include("AccountPeople.Person.TelephoneNumbers") where each Include is done using inner joins.
In something I did earlier I actually used Include("AccountPeople").Include("AccountPeople.Person").Include("AccountPeople.Person.TelephoneNumbers"), however I don't know if it actually brought what I wanted as the task did not make it relevant to verify it. The documentation tells that if you have Include("AccountPeople.Person.TelephoneNumbers")
you basically achieve Include("AccountPeople").Include("AccountPeople.Person").Include("AccountPeople.Person.TelephoneNumbers").
Perhaps you should try this idea in your situation (you can have any number of Include(...) statements).
It was interesting to read your reflections on WPF. I must say I'm also quite impressed of it, however I've noticed that the level in the tree at which DataTemplates are defined gives a large impact on the performance.
Best regards,
Henrik Dahl

