none
DataSet/DataTable -- *append* changes to file? RRS feed

  • Question

  • Greetings,

    I am developing a data-intensive application (a trading application) that must be able to process thousands of messages/transactions per second, but must also maintain data persistence in the case of an application crash.  I have several DataSets that maintain this data.  Some of these DataSets are simply logs -- data is ONLY ever appended to them.  These are strongly-typed DataSets, so I can simply call:

    MyDataSet.AddMyDataSetRow(...);

    to add the new row.  These DataSets can often become hundreds of thousands of rows large and so constantly writing the entire thing to a file (WriteXml()) is too costly (especially if I were to write to file every time a row is added, which can be hundreds of times per second).  What I'd like to be able to do, for the DataSets that are only ever appended to, is to just write the *new* rows to the file.  Is there a way I can accomplish this?

    Thanks,
    Rick

    Friday, March 14, 2008 4:49 PM

Answers

  • Rick,

     

    Both the Express and full versions of SQL Server use the same file, and are in fact very similar products. There are license restrictions around the Expression version, and it doesn't have some of the features of the higher end editions like the Enterprise Edition, but they are in the same product in their core.

     

    File IO will generally be the fastest for straight out writing, as it's just streaming to the disk. It's when you need to make changes, or only read some data in the middile of the file, that you run into problems. Unless you have a very limited feature set and have a need for extremely high performance, your best bet is going to be going with some sort of database.

     

    Thanks,

    Erick

     

    Tuesday, March 25, 2008 12:21 AM
    Moderator

All replies

  • Well the XML file ends with a closing tag, so you can't simply append data to it.

    I suppose if you don't mind holding two copies of the log in memory, you could write the log to a MemoryStream
    instead of a file. Keep the previous MemoryStream alive and, when writing the new log, compare it to the old log
    and find out where the changes start. Then open the file manually, and only write the changed part of the file to
    disk.

    It would still be slow, but at least you'd avoid I/O overhead.

    Another alternative is not to use an XML file for logging, but that might be more challenging since .NET framework
    has no support for it, AFAIK.
    Tuesday, March 18, 2008 6:42 PM
  • You can't just mark the MyDataTableRow class as Serializable, because its constructor is hidden.  But you can implement a method that will serialize it to a Stream.  I'm lazy enough that I'd probably implement a parallel class with the same fields, like:

     

    Code Snippet

    [Serializable]

    public class MyDataTableRowSerializable

    {

       public int ID;  // these fields all replicate those found in MyDataTableRow

       public string Description;

       ...

    }

     

    This class you can serialize and deserialize like anything else.  Implement a constructor that takes an object[] array (the row's ItemArray) and sets the fields.  (You'll also need to implement an ItemArray method that returns an object[] array containing the column values - you'll see that in a bit.) 

     

    Then you can implement a method that looks something like this:

     

    Code Snippet

    public MyDataTableRow AddAndSerializeMyDataTableRow(MyDataTable t, Stream s)

    {

       MyDataTableRow r = t.AddMyDataTableRow();

       MyDataTableRowSerializable rs = new MyDataTableRowSerializable(r.ItemArray);

       BinaryFormatter f = new BinaryFormatter();

       f.Serialize(s, rs);

       return r;

    }

     

    Now every new row you create gets serialized to a stream.  You can also implement the corresponding deserialization method:

     

    Code Snippet

    public MyDataTableRow DeserializeAndAddMyDataTableRow(MyDataTable t, Stream s)

    {

       BinaryFormatter f = new BinaryFormatter();

       MyDataTableRowSerializable rs = (MyDataTableRowSerializable)f.Deserialize(s);

       MyDataTableRow r = t.AddMyDataTableRow();

       for (int i=0; i<rs.ItemArray.Length; i++)

       {

          r.ItemArray[i] = rs.ItemArray[i];

       }

    }

     

    There are probably easier ways to do some of these steps.  But conceptually it's pretty simple.

    Tuesday, March 18, 2008 10:26 PM
  • What I would suggest is to install SQL Express or SQL CE on the local machine, then use a DataAdapter to add the added records to the file. With SQL CE, you don't have any installation requirements, but you still get a powerful database. This way, you can add the new rows whenever you want, and as long as you call AcceptChanges (which the DataAdapter does for you), you will only ever be adding new rows. Then, if you need to load the data back from the file, you can simply select from the SQL CE database into a DataSet.

     

    Let me know if this makes sense.

     

    Thanks,

    Erick

    Saturday, March 22, 2008 5:10 AM
    Moderator
  • Erick,

    Yes, that makes sense.  I'm curious though -- I've been testing the speed differences between SQL Server 2008 and file I/O.  It seems the latter is faster assuming the tables are small, however if they are large, obviously  making updates to the table is faster in the database.  Is SQL Express (which I believe is all file-based) faster for small table updates than SQL Server 2008?

    As it stands now, I've managed to separate the database I/O to a separate thread as much as I can and I'm seeing very good speed results, so this might all be a moot point, but I'm still curious.

    Thanks again,
    Rick
    Monday, March 24, 2008 11:53 AM
  • Rick,

     

    Both the Express and full versions of SQL Server use the same file, and are in fact very similar products. There are license restrictions around the Expression version, and it doesn't have some of the features of the higher end editions like the Enterprise Edition, but they are in the same product in their core.

     

    File IO will generally be the fastest for straight out writing, as it's just streaming to the disk. It's when you need to make changes, or only read some data in the middile of the file, that you run into problems. Unless you have a very limited feature set and have a need for extremely high performance, your best bet is going to be going with some sort of database.

     

    Thanks,

    Erick

     

    Tuesday, March 25, 2008 12:21 AM
    Moderator