locked
Multiple indexes on a DataSet table RRS feed

  • Question

  • Hi all, I'm trying hard to wrap my brain around all of the ADO.NET concepts, and I've run into something that's stumped me.

    I have a table in a DataSet that has a setup kind of like this:
    name (string, primary key)
    organization (string)
    (other info fields...)

    The primary key is the name, so if I issue a LINQ query against the table with a where clause that includes the name, it's not a problem.  However, I might need to, for example, find all names associated with an organization.

    No problem, right?  In attempting to do so, I write something like:

    var q =
      from r in MyDataSet.MyTable.AsEnumerable()
      where r.organization == "Foo Incorporated"
      select r.name;

    The problem is that I can't figure out how to index the organization column so that every time I run that query, it's not doing full table scans against the DataSet.  In Visual Studio 2008, I have an option to make organization a non-primary key field, but if I do, it mandates that the field must be unique, which it's not.  Some of these tables get quite large, so running the above query several times, which I do in my app, starts incurring massive performance hits.

    If I'm hitting directly against the database, the answer is trivial.  Slap a non-unique index on the organization column, and you're done.  Of course, I'm trying to avoid hitting directly against the database, because then I don't get all the other advantages of using a DataSet.  I've even considered adding a Dictionary<string, MyTableRow> object and keeping the index manually, but that seems like a lot of overhead and code to maintain manually.

    Am I trying to do something that DataSets were fundamentally not designed to do?  Am I missing something really stupidly simple in how to have a non-unique index on a DataSet table column?  Any help would be appreciated!
    • Edited by King Skippus Saturday, November 14, 2009 6:21 PM More readable formatting
    Saturday, November 14, 2009 6:19 PM

All replies

  • I haven't really gotten into LINQ all that much (I know, I'm a Neanderthal), so I can't really answer *that* part of your question ... and you don't say how you're using the results, but I wonder if you've tried using DataViews?

    DataView dv = new DataView(MyDataSet.MyTable);
    dv.RowFilter = "organization = 'Foo Incorporated'";
     
    // you could also simply use the DefaultView
    // MyDataSet.MyTable.DefaultView.RowFilter = "organization = 'Foo Incorporated'";

    ~~Bonnie Berent [C# MVP]

    geek-goddess-bonnie.blogspot.com
    Saturday, November 14, 2009 11:28 PM
  • I don't know, one concern is that it sounds like in order to find the organization I'm looking for, I'd have to constantly be modifying the row filter, which sounds like it wouldn't be very efficient.

    However, you may be on the right track with a DataView.  From what I've read about them, if I apply a sort to it by organization, it sounds like it's indexing the view by that field internally, which means that I could query the DataView by organization and get that O(1) performance I'm looking for.  I just skimmed the info on DataViews, though.  I'm assuming that when you add a row to a view's underlying table, it updates the view's index on the sort column?
    Sunday, November 15, 2009 4:47 PM
  • I don't know, one concern is that it sounds like in order to find the organization I'm looking for, I'd have to constantly be modifying the row filter, which sounds like it wouldn't be very efficient.

    Possibly not efficient, you're probably right. But, I forgot about another technique when I replied to you yesterday ... that is using the Table.Select() method. Silly me, this is probably more similar to your LINQ question. Although, to be honest, I don't know how well either one  (LINQ or SELECT) performs with large DataTables. Anyway, the code for that would be:

    DataRow[] rows = MyDataSet.MyTable.Select("organization = 'Foo Incorporated'");

    I'm assuming that when you add a row to a view's underlying table, it updates the view's index on the sort column?
    Yes.


    ~~Bonnie Berent [C# MVP]

    geek-goddess-bonnie.blogspot.com
    Sunday, November 15, 2009 5:02 PM
  • Well, I've been trying to get a DataView to work.  I tried setting up a simple test like so:

    DataView dv = new DataView(MyTable);
    dv.Sort = "organization";

    Then querying it using something like:

    var query = from DataRowView rowView in dv
      where rowView.Row.Field<string>("organization") == "Foo Incorporated"
      select rowView.Row.Field<string>("name");

    (I got the syntax from this article .)  In my testing, I'm actually getting worse performance than before; it takes around twice as long to execute this query than simply querying the table itself.

    At this point, I think that I'm just doing something fundamentally wrong.  I don't know what, but I think that trying to figure this out is going to have to wait until I get some more time to really study and wrap my brain around how this stuff is supposed to work.  I hate to say it, but I think that for now, I'm just going to hit directly against the database.  The "database" is actually a local file anyway, so it's not that big a deal.

    I was really playing around with in-memory DataSets mainly to try to improve performance and learn more about something I'm a little weak on.  I've ended up slowing everything down and come away more confused than I was before, though, so maybe it's just time to punt and go back to how I was doing it before.
    Monday, November 16, 2009 3:34 AM
  • I didn't mean for you to run a LINQ query on the DataView. That doesn't make sense to me. Plus, you've only set the Sort order, you didn't set the RowFilter. When you set the filter, the only thing the DataView will contain are the rows matching your filter criteria. I thought that this is what you wanted.

    Also, did you look at the other option I mentioned, the DataTable.Select()? I think this may be more along the lines of what you might want to do (although, you haven't specified what you're trying to do with your results).
    ~~Bonnie Berent [C# MVP]

    geek-goddess-bonnie.blogspot.com
    Monday, November 16, 2009 4:10 AM