Iterating over a DataColumnCollection
- In a recent project, I discovered the hard way that iterating over a columns collection, even read-only is not thread-safe. I.e. something like:
Parallel.Foreach(..
foreach(DataColumn foo in mytable.Columns)
...
My question is whether using indexes in a standard For loop is thread-safe. I.e. instead of the above, something like:
Parallel.Foreach(..
for(var i = 0; i < mytable.Columns.Count; i++)
...
It should be noted that I'm using the June CTP of the Parallel Extentions for .NET 3.5.
Answers
You're not changing the columns, but you are changing the table: you're modifying multiple rows from the same DataTable concurrently, and that's not thread-safe. You can get a hint about this by looking at the callstack you get in your exception:
at System.Data.RecordManager.NewRecordBase()
at System.Data.DataTable.NewRecord(Int32 sourceRecord)
at System.Data.DataRow.BeginEditInternal()
at System.Data.DataRow.set_Item(DataColumn column, Object value)
Modifying the row results in modifying the table, and thus you're modifying the table from multiple threads. Wrapping the contents of your loop with a lock "fixes" the problem by serializing access to the DataRow's indexer.
So, to answer your original question, it's not whether you use an index to iterate through or whether you use an enumerable, but what you're doing inside your loop that's not safe.- Proposed As Answer byStephen Toub - MSFTMSFT, ModeratorSaturday, September 26, 2009 12:30 AM
- Marked As Answer byStephen Toub - MSFTMSFT, ModeratorTuesday, September 29, 2009 1:13 AM
- Unfortunately my knowledge of DataTable's implementation is incomplete. I doubt there's any way to change this behavior, however... while the DataTable exposes individual objects to represent its logical rows, its seemingly storing the data in some manner that's not amenable to completely independent edits. I'd suggest starting a new thread over in one of the ADO.NET forums, specifically around this issue... I expect you'll have better luck there getting an answer from folks that live and breath DataTable. Good luck.
- Marked As Answer byStephen Toub - MSFTMSFT, ModeratorTuesday, September 29, 2009 1:13 AM
All Replies
- What do you mean by it not being thread-safe, even read-only? What are you observing to indicate that? DataColumnCollection is built internally using an ArrayList, and enumerating over it simply enumerates that ArrayList. As long as you're not modifying the collection during the enumeration, it should be safe to iterate through.
- In my tests, I had to lock the columns collection or else I indexing errors. Thus If I simply did
Parallel.Foreach( mytable.AsEnumerable(), row =>
{
foreach(DataColumn column in mytable.Columns)
{
row[column] = ...
}
}
I would get bizarre index errors. By "read-only" i mean that I'm not changing the columns collection at any point. However, if I did the following
Parallel.Foreach( mytable.AsEnumerable(), row =>
{
lock ( mytable.Columns.SyncRoot )
{
foreach(DataColumn column in mytable.Columns)
{
row[column] = ...
}
}
}
It would work fine. You're not changing the columns, but you are changing the table: you're modifying multiple rows from the same DataTable concurrently, and that's not thread-safe. You can get a hint about this by looking at the callstack you get in your exception:
at System.Data.RecordManager.NewRecordBase()
at System.Data.DataTable.NewRecord(Int32 sourceRecord)
at System.Data.DataRow.BeginEditInternal()
at System.Data.DataRow.set_Item(DataColumn column, Object value)
Modifying the row results in modifying the table, and thus you're modifying the table from multiple threads. Wrapping the contents of your loop with a lock "fixes" the problem by serializing access to the DataRow's indexer.
So, to answer your original question, it's not whether you use an index to iterate through or whether you use an enumerable, but what you're doing inside your loop that's not safe.- Proposed As Answer byStephen Toub - MSFTMSFT, ModeratorSaturday, September 26, 2009 12:30 AM
- Marked As Answer byStephen Toub - MSFTMSFT, ModeratorTuesday, September 29, 2009 1:13 AM
- Is the implication in that call stack that the DataTable creates a new row whenever a cell is edited?! Is there anyway to prevent that silly behavior? Would BeginLoadData/EndLoadData turn that off?
In my routine, each edit to each row is independent from every other edit. I was hoping to achieve a performance improvement (or just better use of the CPU) by using the Parallel.Foreach but if I have to serialize every edit I really lose that ability. - Unfortunately my knowledge of DataTable's implementation is incomplete. I doubt there's any way to change this behavior, however... while the DataTable exposes individual objects to represent its logical rows, its seemingly storing the data in some manner that's not amenable to completely independent edits. I'd suggest starting a new thread over in one of the ADO.NET forums, specifically around this issue... I expect you'll have better luck there getting an answer from folks that live and breath DataTable. Good luck.
- Marked As Answer byStephen Toub - MSFTMSFT, ModeratorTuesday, September 29, 2009 1:13 AM


